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ABSTRACT 


Title  of  Dissertation:  Wavelets  and  Time- Frequency  Methods  in 

Linear  Systems  and  Neural  Networks 
Yagyensh  C.  Pati,  Doctor  of  Philosophy,  1992 

Dissertation  directed  by:  Professor  P.  S.  Krishnaprasad 

Department  of  Electrical  Engineering 

In  the  first  part  of  this  dissertation  we  consider  the  problem  of  rational  approx¬ 
imation  and  identification  of  stable  linear  systems.  Affine  wavelet  decompositions 
of  the  Hardy  space  H2(n+),  are  developed  as  a  means  of  constructing  rational  ap¬ 
proximations  to  nonrational  transfer  functions.  The  decompositions  considered  here 
are  based  on  frames  constructed  from  dilations  and  complex  translations  of  a  single 
rational  function.  It  is  shown  that  suitable  truncations  of  such  decompositions  can 
lead  to  low  order  rational  approximants  for  certain  classes  of  time-frequency  localized 
systems.  It  is  also  shown  that  suitably  truncated  rational  wavelet  series  may  be  used 
as  ‘linear-in-parameters’  black  box  models  for  system  identification.  In  the  context  of 
parametric  models  for  system  identification,  time-frequency  localization  afforded  by 
affine  wavelets  is  used  to  incorporate  a  priori  knowledge  into  the  formal  properties 
of  the  model.  Comparisons  are  made  with  methods  based  on  the  classical  Laguerre 
filters. 

The  second  part  of  this  dissertation  is  concerned  with  developing  a  theoretical 
framework  for  feedforward  neural  networks  which  is  suitable  for  both  analysis  and 
synthesis  of  such  networks.  Our  approach  to  this  problem  is  via  affine  wavelets  and 
the  theory  of  frames.  Affine  frames  for  L2,  are  constructed  using  combinations  of 
sigmoidal  functions  and  the  inherent  translations  and  dilations  of  feedforward  net¬ 
work  architectures.  Time-frequency  localization  is  used  in  developing  methods  for  the 
synthesis  of  feedforward  networks  to  solve  a  given  problem. 

These  two  seemingly  disparate  problems  both  lie  within  the  realm  of  approximation 
theory,  and  our  approach  to  both  is  via  the  theory  of  frames  and  affine  wavelets. 
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Chapter  1 


Introduction 


As  is  perhaps  suggested  by  the  title,  this  dissertation  is  composed  of  two  somewhat 
distinct  parts.  The  first  part  (Chapters  3  and  4)  is  concerned  with  the  approximation 
and  identification  of  causal  linear  time-invariant  systems  using  time-frequency  local¬ 
ized  rational  (wavelet)  representations.  In  Section  1.2  below  we  give  a  brief  description 
of  the  problem  and  an  outline  of  the  work  done. 

In  the  second  part  of  this  dissertation  (Chapter  5)  we  address  the  problem  of 
developing  a  formal  mathematical  framework  suitable  for  both  analysis  and  synthesis 
of  feedforward  neural  networks  using  wavelet  theory.  An  outline  of  the  problem  and 
a  summary  of  our  results  are  given  in  Section  1.3. 

A  common  thread  in  the  approach  taken  here  to  the  above  two  problems  is  the 
theory  of  frames  and  affine  wavelets.  A  second  area  of  commonality  in  these  two 
seemingly  disparate  problems  is  that  both  fall  within  the  realm  of  approximation 
theory. 

1.1  Affine  Wavelet  Transforms 

In  its  continuous  form,  the  (affine)  wavelet  transform  is  an  integral  transform  against 
dilations  and  translations  of  a  single  function  called  the  analyzing  wavelet,  where  for 
an  analyzing  wavelet  g ,  the  translations  and  dilations  are  of  the  form  (x)  = 
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al/'2g(ax  —  b).  In  Chapter  2  we  define,  and  briefly  review  some  properties  of,  wavelet 
transforms  on  L2(IR)  .  Although  the  use  of  dilations  and  translations  of  a  single 
function  is  not  new  (c.f.  [AK69]),  the  more  recent  application  of  these  techniques 
to  signal  analysis  may  be  traced  to  J.  Morlet  [MAFG82,  GM84],  who  proposed  the 
use  of  “wavelets  of  constant  shape”  for  the  analysis  of  seismic  signals.  In  [MAFG82, 
GM84]  and  subsequent  papers  [GMP85,  GMP86],  in  which  the  name  was  shortened  to 
“wavelets”,  attention  was  restricted  to  what  is  now  known  as  the  continuous  wavelet 
transform  where  the  dilations  and  translations  vary  continuously  (i.e.  (a,  b)  £  IR\0  X 
IR). 

Generalization  of  the  concept  of  bases  in  Hilbert  spaces  leads  to  the  notion  of  what 
are  called  frames  (c.f.  [Dau90,  DS52]).  A  review  of  the  key  aspects  of  frame  theory 
is  provided  in  Chapter  2,  and  is  supplemented  by  material  in  Appendix  C.  The  most 
important  property  of  frames  is  that  the  sequence  of  numbers  obtained  by  taking 
inner  products  of  a  vector  /,  in  a  Hilbert  space  7i ,  with  the  elements  of  a  frame  for  H , 
comprise  a  complete  and  ‘stable’  representation  of  the  vector  /.  In  [DGM86]  discrete 
forms  of  the  wavelet  transform  were  considered  in  which  dilations  and  translations 
were  restricted  to  discrete  lattices  so  that  the  resulting  translates  and  dilates  of  the 
analyzing  wavelet  formed  frames.  This  was  a  first  step  towards  the  construction 
of  a  number  of  orthonormal  wavelet  bases  (c.f.  [Mey86,  Bat87]).  The  subsequent 
development  of  the  theory  of  multiresolution  analyses  [Mal89a],  provided  a  framework 
for  the  study  of  a  class  of  orthonormal  wavelet  bases,  and  led  to  the  construction  of 
families  of  compactly  supported  orthonormal  wavelet  bases  by  Daubechies  [Dau88a]. 
Most  of  the  activity  involving  wavelet  transforms  in  signal  analysis  has  since  been 
restricted  to  orthonormal  wavelet  bases,  partly  due  to  their  appealing  computational 
properties  and  their  connections  to  FIR  filters  and  subband  coding  (c.f.  [Dau88a]). 
In  this  dissertation  the  emphasis  is  on  wavelet  frames  rather  than  orthonormal  bases. 

Perhaps  the  most  useful  property  of  affine  wavelet  transforms,  is  the  time- 
frequency  localization  which  arises  from  the  use  of  dilations  and  translations  together 
with  an  ‘admissibility’  requirement  which  forces  the  analyzing  wavelet  to  be  approx- 
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imately  a  bandpass  function.  By  time-frequency  localization,  we  are  referring  to  the 
fact  that  a  wavelet  transform  approximately  provides  a  frequency  analysis  of  signals, 
locally  in  time.  This  should  be  viewed  in  analogy  with  the  short  time  Fourier  trans¬ 
form  (STFT),  where  window  functions  are  used  to  achieve  time  localization  of  the 
frequency  analysis.  Thus  the  wavelet  transform  is  well-suited  to  the  analysis  of  sig¬ 
nals  with  time-varying  frequency  content.  There  exist  numerous  examples  in  nature 
of  signals  with  such  time-varying  frequency  content.  One  interesting  example  pointed 
out  in  [Dau90]  is  that  of  music.  A  musical  score  (c.f.  Figure  1.1)  may  be  regarded 
as  a  time-frequency  localized  representation  in  which  the  length  of  a  note  is  its  time 
duration  and  the  particular  note  specifies  the  frequency  content.  Other  examples  of 
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Figure  1.1:  “Dancing  Wavelets”,  by  F.  W.  Meacham  (Willis  Woodward  &  Co.  1892) 

such  signals  include  visual  images,  where  now  we  have  to  consider  the  two-dimensional 
notion  of  spatio-spectral  localization,  for  instance,  edges  in  an  image  may  be  regarded 
as  a  spatially-localized  high  frequency  components.  Much  of  the  work  described  in  this 
dissertation  is  directly  aimed  at  exploiting  the  time-frequency  localization  afforded  by 
affine  wavelets. 
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1.2  Rational  Wavelets  in  Approximation  and  Identifi¬ 
cation  of  Stable  Linear  Systems 

A  recurring  theme  in  the  study  of  linear  dynamical  systems  is  the  approximation  of 
transfer  functions  by  rational  functions.  It  is  well-known  that  given  a  transfer  function 
G(s),  the  system  described  by  G  has  a  finite-dimensional  state  space  realization  if  and 
only  if  G  is  a  rational  function  of  s.  The  problem  of  rational  approximation  arises 
in  two  contexts  associated  with  linear  dynamical  systems.  First  there  is  the  problem 
of  model  order  reduction  where  high-order  (often  infinite-dimensional)  systems  are  to 
be  approximated  by  lower  order  systems.  In  this  case  it  is  assumed  that  the  system 
(transfer  function)  is  known.  The  second  setting  in  which  the  problem  of  rational 
approximation  arises  is  that  of  system  identification.  In  this  case  it  is  assumed  that 
the  true  system  is  not  known,  and  a  model  of  the  system  is  to  be  constructed  based 
on  observations  of  the  input-output  behavior. 

1.2.1  Rational  Wavelet  Approximations  of  Stable  Linear  Systems 

Rational  approximation  of  infinite-dimensional  systems  is  a  topic  which  has  received 
considerable  attention.  The  problem  is  that  of  approximating  (nonrational)  transfer 
functions  of  infinite-dimensional  systems  by  a  low-order  rational  approximants.  A  va¬ 
riety  of  rational  approximation  techniques,  have  been  proposed  and  studied  over  the 
years.  Among  these  are  methods  based  on  Hankel  singular  values,  truncations  of  bal¬ 
anced  realizations,  Pade  approximations,  and  truncations  of  Fourier  representations 
with  respect  to  specific  orthonormal  bases  (c.f.  [GLP90,  GLP91a]).  Among  the  issues 
which  are  addressed  in  this  area  are:  (i)  the  rate  of  convergence  of  approximation  as 
model  order  is  increased,  (ii)  optimally  convergent  methods,  (iii)  ease  of  computation, 
(iv)  the  class  or  classes  of  problems  well-suited  to  a  given  method. 

In  Chapter  3,  we  propose  a  new  method  for  use  in  the  problem  of  rational  ap¬ 
proximation  of  a  class  of  stable  transfer  functions.  The  transfer  functions  of  interest 
here  are  contained  in  the  Hardy  space  H2(II+)  where  n+,  denotes  the  right-half  com- 


4 


plex  plane  3£e  s  >  0.  Our  approach  utilizes  decompositions  of  H2(II+),  via  frames 
constructed  from  dilations  and  complex  translations  of  a  single  real-rational  analyz¬ 
ing  wavelet.  The  considerable  freedom  in  the  choice  of  rational  analyzing  wavelets 
is  demonstrated  via  a  characterization  of  all  rational  analyzing  wavelets  for  H2(II+) 
in  Theorem  3.4.  One  of  the  main  results  of  Chapter  3  is  Theorem  3.3  which  shows 
that  for  functions  in  H2(II+)  which  are  Laplace  transforms  of  real- valued  functions  in 
L2(0,  oo )  ,  it  is  possible  to  regroup  terms  in  wavelet  series  based  on  rational  analyzing 
wavelets,  such  that  each  term  in  the  new  series  is  a  real-rational  function.  We  refer  to 
the  series  obtained  via  such  regroupings  as  a  wavelet  system  (WS)  decomposition.  A 
wavelet  system  decomposition  may  be  viewed  as  a  parallel  decomposition  of  systems, 
via  time-frequency  localized  finite-dimensional  systems.  The  method  we  propose  for 
rational  approximation  of  transfer  functions  is  based  on  particular  truncations  of  WS 
decompositions.  For  certain  classes  of  time-frequency  localized  systems,  such  trunca¬ 
tions  can  lead  to  fairly  low  order  approximants.  Our  approach  was  in  part  motivated 
by  the  observation  that  transfer  functions  arising  from  physical  systems  often  lend 
themselves  to  reasonably  compact  time-frequency  localized  representations.  We  also 
examine  some  of  the  properties  of  WS  decompositions  including  minimal  state  space 
realizations  of  truncated  WS  representations,  and  establish  coarse  bounds  on  the  ap¬ 
proximation  error  for  the  truncated  series. 

1.2.2  Rational  Wavelet  Models  in  System  Identification 

System  identification  is  a  process  of  using  observed  data  to  determine  usable  descrip¬ 
tions  of  unknown  dynamical  systems.  Here  we  are  concerned  with  systems  with  the 
explicit  assumptions  of  linearity,  time-invariance,  and  causality.  System  identifica¬ 
tion  plays  a  crucial  role  in  any  design  process  involving  physical  systems,  and  thus 
is  deserving  of  the  extensive  attention  which  it  has  received.  For  a  thorough  treat¬ 
ment  and  survey  of  much  of  the  work  in  the  area  of  system  identification  we  refer  to 
[Eyk74,  Lju87,  UR90,  Str81,  Wel81,  You81].  Descriptions  of  systems  obtained  via  the 
identification  process  may  be  either  in  parametric  or  nonparametric  form.  In  this  dis- 


sertation  attention  is  largely  restricted  to  parametric  descriptions.  Methods  leading 
to  parametric  descriptions  of  systems  are  termed  parametric  identification  methods. 

The  essential  steps  in  the  process  of  parametric  identification  are:  (i)  experiment 
design  for  the  collection  of  data,  (ii)  selection  of  a  parametric  model  set,  and  (iii) 
selection  of  an  identification  scheme  to  fit  the  model  to  the  data  via  estimation  of  the 
parameters.  It  has  often  been  noted  that  the  most  important  as  well  as  difficult  step 
in  this  process,  is  the  selection  of  a  suitable  parametric  model  set.  It  is  important  that 
any  a  priori  knowledge  or  engineering  intuition  about  the  unknown  system  be  incorpo¬ 
rated  into  the  formal  properties  of  the  parametric  model  set.  The  types  of  parametric 
models  we  consider  here  are  typically  called  ‘black-box’  models,  and  are  characterized 
by  the  fact  that  the  parameters  in  the  models  have  no  physical  significance. 

In  Chapter  4,  we  show  that  truncations  of  WS  representations,  may  be  used  as 
linear- in-parameters  black-box  models  for  the  purpose  of  identification.  It  is  also 
shown  that  the  time-frequency  localization  of  the  component  systems  of  a  WS  model, 
offer  a  convenient  means  of  incorporating  time  and  frequency  domain  a  priori  infor¬ 
mation  into  the  model.  The  forms  of  prior  knowledge  considered  are  time  constants, 
delays,  and  frequency  weighting.  An  important  feature  of  WS  models  is  that  both 
time  and  frequency  domain  information  may  be  treated  simultaneously,  as  opposed 
to  the  separate  treatment  encountered  in  most  schemes. 

We  make  comparisons  of  WS  models  with  methods  based  on  the  classical  La- 
guerre  filters.  Laguerre  filters,  which  form  orthonormal  bases  for  H2(II+),  have  been 
extensively  studied  in  the  context  of  system  identification  and  rational  approxima¬ 
tion.  The  use  of  Laguerre  filters  in  system  identification  apparently  dates  back  to 
Wiener  [Wie56].  There  has  recently  been  considerable  renewed  interest  in  Laguerre 
methods  due  to  computational  simplicity  of  these  techniques  relative  other  (optimal) 
approximation  schemes  such  as  Hankel  norm  optimal  methods.  More  recent  work  on 
Laguerre  filters  may  be  found  in  [Mak90a,  Wah91,  Par91,  Mak90b,  DZP90]  and  the 
references  contained  therein.  We  demonstrate  by  means  of  examples,  that  for  certain 
important  classes  of  systems,  the  performance  of  the  WS  approximation  may  be  far 
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superior  to  that  of  Laguerre  methods. 


1.3  Wavelet  Analysis  and  Synthesis  of  Feedforward 
Neural  Networks 

Over  the  past  few  years  there  has  been  a  great  deal  of  interest  in  the  study  and  appli¬ 
cations  of  neural  networks.  Neural  networks  are  a  class  of  computational  architectures 
characterized  by  large  numbers  of  simple  processing  elements  which  are  interconnected 
in  a  weighted  manner.  The  term  neural  reflects  the  initial  biological  inspiration  for 
such  networks.  Feedforward  neural  networks,  which  comprise  an  important  subclass 
of  neural  networks,  are  characterized  by  a  well  defined  (forward)  direction  of  signal 
flow,  and  have  found  applications  in  the  approximation  of  (static)  mappings  from 
discrete  data. 

Problems  of  approximation  of  static  mappings  arise  within  a  variety  of  contexts. 
Examples  of  static  map  learning  problems  to  which  feedforward  networks  have  been 
applied  may  be  found  in  areas  such  as  speech  recognition  [LZ89],  control  and  identifi¬ 
cation  of  dynamical  systems  [NP90]  and  robot  motion  control  [Kup88]  [KW90],  among 
others.  There  of  course  exist  numerous  methods  from  mathematical  approximation 
theory  which  may  also  be  considered  for  application  to  these  problems.  The  appeal 
of  the  feedforward  network  methodology  derives  from  empirically  demonstrated  suc¬ 
cess  in  problems  involving  mappings  where  the  domain  or  range,  or  both  are  of  high 
dimension.  For  such  high- dimensional  problems,  the  more  structured  approaches  of 
classical  approximation  theory  often  prove  to  be  computationally  intractable. 

The  empirical  success  of  feedforward  networks  in  approximation  problems 
prompted  questions  regarding  the  class  or  classes  of  mappings  which  may  be  approxi¬ 
mated  within  these  architectures.  In  answer  to  these  questions,  a  number  of  rigorous 
mathematical  descriptions  of  the  approximating  properties  of  feedforward  networks 
(c.f.  [Cyb89,  Cyb88,  HSW89,  HSW90])  were  put  forth.  Most  of  these  descriptions 
rely  on  arguments  of  density,  of  the  class  of  maps  that  can  be  implemented  within  a 


feedforward  network,  in  various  function  spaces.  Tlius  these  results  should  be  viewed 
as  theoretical  justification  for  the  use  of  feedforward  networks  in  certain  classes  of 
approximation  problems.  However,  a  unknown  entity  in  these  formulations  was  the 
exact  feedforward  network  implementation  of  a  given  mapping  from  a  suitable  class. 
In  this  sense  the  above  results  are  nonconstructive  and  provide  no  further  insight 
into  the  problem  of  synthesis  of  feedforward  networks.  The  problem  of  feedforward 
network  synthesis  is  that  of  designing  a  network  suited  to  solve  the  approximation 
problem  at  hand.  Thus  arises  the  question:  is  it  possible  to  construct  a  theoretical 
framework  for  feedforward  networks  which  first  of  all  provides  a  constructive  analysis 
of  the  approximation  properties  of  feedforward  networks,  and  secondly,  is  amenable 
to  the  development  of  synthesis  algorithms. 

In  Chapter  5  we  utilize  the  theory  of  frames  and  affine  wavelets  to  develop  a  for¬ 
mal  mathematical  framework  suitable  for  both  analysis  and  synthesis  of  feedforward 
neural  networks.  Our  approach  was  initially  motivated  by  the  observations  that  (i) 
there  exists  an  inherent  translation  and  dilation  structure  in  feedforward  networks, 
and  (ii)  the  flexibility  of  frame  theory  permits  considerable  freedom  in  the  selection 
of  analyzing  wavelets.  It  is  shown  that  the  commonly-used  ‘sigmoidal  activation  func¬ 
tions’  may  be  combined  in  a  manner  so  as  to  form  admissible  analyzing  wavelets,  and 
thereby  frames  for  L2,  within  the  standard  architecture  of  feedforward  networks.  In 
this  manner  we  obtain  a  constructive  analysis  result  for  the  approximation  of  square 
integrable  functions  by  feedforward  networks.  We  also  show  that  time-frequency  lo¬ 
calization  properties  of  affine  wavelets  can  form  the  basis  for  the  development  of 
systematic  network  synthesis  methods. 
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Chapter  2 


Background  on  Wavelet 
Transforms  and  Frames 


2.1  Preliminaries 

Given  the  wide  range  of  conventions  used  in  the  fields  of  engineering,  mathematics, 
and  physics,  this  section  covers  the  various  definitions  which  will  be  used  throughout 
this  dissertation. 

Most  of  this  dissertation  deals  with  the  Hilbert  space  L2(IR)  ,  the  space  of  all 
measurable  finite  energy  functions,  i.e  all  measurable  functions  defined  on  the  real 
line  such  that 

I  i/(x)|2  dx  <  oo. 

Jn 

This  space  is  equipped  with  a  standard  inner  product,  denoted  by  (•,  •}. 

( f,g)=  [  f(x)g(x)dx. 

J  R 

2.1.1  Fourier  and  Laplace  Transforms 
Fourier  Transforms  on  L2(IR) 

The  definition  of  Fourier  transform  which  is  used  here  is  according  to  the  convention 
commonly  used  in  engineering  literature. 
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Given  a  function  /  £  L2( IR)  its  Fourier  transform  f  is  defined  as, 


/(«)  =  [  f(x)e-iwxdx, 

JR 

where  convergence  of  the  integral  is  taken  in  the  L2  sense.  The  corresponding  inversion 
formula  is, 

/(*)  =  ^~  f  iVywxdx. 

J]R, 

The  unitary  nature  of  the  Fourier  Transform  is  expressed  in  Parseval’s  theorem. 


Theorem  2.1  (Parseval)  Given  f  £  L2(IR)  with  Fourier  transform  f, 


/E|/(l)i2  ix  =  h. /,I'M 


du. 


A  generalization  of  Parseval’s  theorem  which  is  attributed  to  Plancheral  is,  the  fact 
that, 

1 


(/.9>=27(/.s). 


Laplace  Transforms 


For  functions  defined  on  the  positive  half-line  IR+  the  (unilateral)  Laplace  transform 
is  defined  by, 

r  oo 

F(s)  ~  /  f(x)e~sxdx, 

Jo 

where  the  integral  converges  on  a  half-plane  5?e  s  >  cr,  a  being  the  abscissa  of  conver¬ 
gence.  Inversion  of  the  Laplace  transform  is  accomplished  by  the  inversion  formula, 


1  n+ioo 

f(x)  =  — :  F(s)estds , 

Z7TZ  J y — i oo 

where  7  is  taken  larger  than  the  abscissa  of  convergence. 


2.1.2  Time-Frequency  Localization 

The  notion  of  localization  in  joint  time-frequency  space  is  important  in  many  areas  of 
signal  analysis  and  is  crucial  to  the  utility  of  wavelet  transforms.  In  this  section  we 
introduce,  and  to  some  extent  formalize  the  concept  of  time-frequency  localization. 
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In  the  context  of  time-frequency  localization,  we  are  interested  in  functions  which 
are  essentially  time-limited  to  some  interval  [— T,T ]  and  essentially  band-limited  to 
some  frequency  range  [D0,  fli]  |J[— Do>  —  fii].  One  measure  of  how  well  a  function  is 
localized  in  either  the  time  or  frequency  domains  is  given  in  terms  of  the  second  mo¬ 
ments  (or  variances)  of  the  squared  magnitude  functions.  There  is  however  a  limit  to 
the  extent  of  joint  time-frequency  localization  attainable,  which  is  specified  by  Heisen¬ 
berg’s  inequality.  Heisenberg’s  inequality  states  that  both  for  the  second  moment  of 
|/|2  and  the  second  moment  of  |/|2  cannot  simultaneously  be  made  arbitrarily  small, 
and  in  fact  the  the  product  of  these  two  second  moments  is  bounded  below  away  from 
zero.  More  precisely,  we  have  the  following  classical  result  (c.f.  [DM72]). 


Theorem  2.2  (Heisenberg’s  inequality)  For  f  6  L2(1R)  , 

[  x1\f(x)\dx  X  [  u>2  f(u>) 

J] R  J  R. 


2  du  >  -I1*"4 
-  4 


(2.1.1) 


Equality  in  (2.1.1)  is  achieved  only  by  functions  which  are  constant  multiples  of  the 
Gaussian  function.  Thus  Gaussian  functions  are  optimally  concentrated  in  joint  time- 
frequency  space. 

Let  us  make  the  following  definitions  for  the  centers  of  concentration  of  functions 
/,  whose  Fourier  transforms  have  even  magnitude  |/(—  w)|  =  |/(u>)|. 

Definition  2.1  Given  a  function  f  €  L2(IR)  ,  with  Fourier  transform  f ,  such  that 

l/(-MI  =  I/Ml; 


(1)  The  center  of  concentration,  xc(f),  of  f,  is  defined  as 

x‘(/)=wll!/(I)|2<iI- 


(2)  the  center  of  concentration,  wc(|/|2),  o/|/|2.  (or  center  frequency  of  f)  is  defined 
as 

~  1  r°°  — 
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Note  that  this  definition  also  applies  to  functions  whose  Fourier  transforms  are  sup¬ 
ported  strictly  on  IR+.  If  we  now  define  the  second  moments, 

a(f)  =  pip  J^x~  x^f)?\fi.x)\2dx 

^  1  7*00  ^  ^ 

<f)  =  (w-wc(/)2|/M|2do;, 

Thus  <r(/)  and  cr(f)  may  be  respectively  used  as  measures  of  time  and  frequency 
localization. 

A  second  related  view  of  time-frequency  localization  may  be  stated  in  terms  of 
regions  which  contain  most  of  the  ‘energy’  in  the  signal.  Let  us  define  two  projection 
operators  Pq  and  Qj  on  L2(IR)  by, 

(fti /)(“>)  =  Xi_o>n]  («)/(«) 

(Qrf)(x)  =  Xt_r,T1  (*)/(*), 

where  xr  denotes  the  indicator  function  of  the  interval  I.  Using  these  definitions  we 
assume  that  the  functions  /  of  interest  to  us  are  such  that, 

||(1  -  Qr)f\\  <  *  and  ||(1 -(P0l  -Pfi0))/||  <  €•  (2.1.2) 

To  relate  flo,  fli  and  T  to  a  function  /  E  L2(IR)  in  a  more  precise  manner,  we  make 
the  following  definitions. 

Definition  2.2  Given  f  E  L2(IR)  ,  /  :  IR  — *•  IR,  with  Fourier  transform  f ,  and 
centers  of  concentration  xc(f )  and  o;c(|/|2), 

V{f\e)  =  i[x0,xl\\  \xc(f)  —  x0|  =  \xc(f)  ~  xi\  and  [  \f(x)\2dx  <  e]|/||2 1  , 

(  Jx£] R\[x0,xi]  J 

and, 

V(f;c)  =  |[w0,wi]  :  Wo  =  max(0,wb),  |wc(|/|2)  -  wo|  =  kc(|/!2)  ~  “>i\  , 

and  [  \f(u>)\2du  <  ej|/||2\  . 
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(1)  The  epsilon  support  (or  time  concentration)  of  f,  denoted  e-supp(f,e)  is  the  set 

[x0(/),xi(/)]  e  such  that, 

\xo(f)-x1(f)\  =  inf  |®o-*i|. 

[x0  ,xi]eV 

(2)  The  epsilon  support  of  \f\2  (or  frequency  concentration  of  f)  denoted 

e-supp(\f\ 2,e)  is  the  set  [wo(/), wi(/)]  £  such  that 

K(/)  -  w0(/)|  =  mf  ^  |wx  -  w0| . 

[wo,wi]e"P 

Remark:  The  e-support  of  /  is  the  smallest  (symmetric  about  xc(f))  interval 
containing  (1  —  e)x  the  total  signal  energy.  Note  that  when  xc(f)  =  0,  taking 
*o(/)  =  ~xi (/)  —  —T,  do  —  wo (/),  fli  =  u>i (/),  and  e  =  e ,  Equation  2.1.2  is 
satisfied. 


2.1.3  Time-Frequency  Localized  Representations  and  Windowed 
Fourier  Transforms 


In  a  variety  of  applications  of  signal  processing  one  often  encounters  problems  where 
it  is  desirable  to  analyze  the  frequency  content  of  a  signal  locally  in  time.  Examples  of 
applications  where  joint  time-frequency  localized  representations  are  desirable  can  be 
found  for  instance  in  image  processing  [Mal89c]  [Mal89b]  [Dau83]  [PZ88],  and  analysis 
of  acoustic  signals  [KMMG87].  To  illustrate  this  point,  consider  the  function  /  defined 


-l 


Ant 


is  an 


on  [ — 7T,  7r]  shown  in  Figure  2.1.  Since  the  trigonometric  system 
orthonormal  basis  for  the  Hilbert  space  L2[— t,7t],  we  can  represent  /  by  its  Fourier 
expansion 

(2.1.3) 


m  =  £  c»4=e“, 


where  the  Fourier  coefficients  cn’s  are  computed  via  the  inner  products, 


dt. 


The  representation  in  (2.1.3)  gives  a  precise  frequency  analysis  of  the  signal  /,  in  the 
sense  that  since  each  basis  function  is  exactly  localized  in  the  frequency  domain  at  u  = 
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n,  the  coefficients  cn  readily  reveal  the  presence  or  absence  of  these  frequencies  in  the 
signal.  However,  since  each  basis  element  has  constant  magnitude  (=  [v^rj  )  over 
the  entire  interval  [— 7r,7r],  the  coefficients  are  obtained  by  integrating  contributions  of 
the  corresponding  frequencies  with  equal  weighting  over  the  entire  time  span.  Hence 
the  cn’s  cannot  readily  reveal  the  fact  that  the  signal  is  mostly  flat  and  that  high 
frequency  components  are  localized  to  a  short  time  interval.  It  is  the  need  to  analyze 


Figure  2.1:  Example  of  signal  with  time- varying  frequency  content. 


such  signals  with  time- varying  frequency  content  which  has  led  to  the  study  of  function 
decompositions  which  naturally  capture  this  behavior.  Such  decompositions  call  for 
‘basis’  functions  which  are  themselves  well-localized  in  both  time  and  frequency.  We 
refer  to  such  decompositions  as  time-frequency  localized  representations. 

The  short-time  (or  windowed)  Fourier  transform  (STFT)  was  developed  as  one 
means  of  obtaining  such  a  time-frequency  localized  representation.  The  STFT  in¬ 
volves  multiplying  a  signal  /  by  a  window  function  g  and  then  computing  the  Fourier 
coefficients  of  the  resulting  product  fg.  Usually  the  window  function  g  is  chosen  to  be 
either  compactly  supported  or  at  least  rapidly  decaying.  Hence  if  the  window  function 
g  is  centered  at  x  =  0,  the  Fourier  coefficients  of  fg  give  a  picture  of  the  frequency 
content  of  /  locally  in  time  near  x  =  0.  Further  analysis  of  the  signal  may  then  be  per- 
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formed  by  shifting  the  window  function  in  time  and  once  again  computing  the  Fourier 
coefficients  of  /  multiplied  by  the  shifted  window  function  g(x  —  xq).  Repeating  this 
process,  results  in  a  doubly  indexed  set  of  coefficients, 

cm,n(/)  =  [  f(x)g(x  -  nx0)e~'imuJoXdx,  m,n6Z.  (2.1.4) 

J  R 

which  provide  a  frequency  analysis  of  /  over  different  (localized)  regions  of  time.  A 
question  that  naturally  arises  regarding  the  STFT  is;  do  the  coefficients  {cTO,n(/)} 
provide  a  complete  and  ‘stable’  representation  of  the  original  signal  /.  As  we  will  see 
in  Section  2.2.4,  this  question  may  be  addressed  using  the  theory  of  frames.  Gabor 
in  [Gab46]  proposed  a  transform  of  the  form  in  (2.1.4)  for  use  in  data  transmission 
using  a  Gaussian  window  function  g.  Gabor  originally  proposed  this  transform  with 
xou>o  =  2tt,  which  in  fact  results  in  unstable  reconstructions  (c.f.  [Dau90]). 

Remark:  Here  we  refer  to  the  coefficient  sequence  {cm>n(/)},  as  a  complete  rep¬ 
resentation  of  /,  if  /,  can  be  reconstructed  from  the  sequence.  If  furthermore,  the 
reconstruction  of  /,  is  via  a  bounded  operator,  we  refer  to  the  representation  as  stable. 

2.2  Wavelet  Transforms 

2.2.1  Weyl-Heisenberg  Wavelets  and  the  Windowed  Fourier  Trans¬ 
form 

From  another  viewpoint,  the  coefficients  {cm,n}  in  the  windowed  Fourier  transform 
(2.1.4),  are  obtained  via  the  L2(IR)  inner  products  of  /  with  a  sequence  of  functions 
{gm,n}  which  are  generated  via  modulations  and  translations  of  a  single  function  g.  If 
we  define  the  modulation  and  translation  operators  Ea  and  T&  on  L2(IR)  respectively 
by, 

(Eaf)(x)  =  eia*f(x) 

0 Tbf)(x )  =  f(x-b ), 
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then  we  have 

9m,n(.x )  =  (ErnUj0TnX0g')(x')  =  e  0  g{%  —  uxq). 

The  sequence  gm,n  thus  defined  can  be  regarded  as  samples  on  a  discrete  sublattice  of 
time-frequency  space  of  the  functions 

9{p’q)  =  (EpTqg)(x)  =  e^^TTq), 

where  p  and  q  are  now  continuous  variables  in  IR.  An  important  property  of  the 
functions  g^p,q^  is  the  resolution  of  the  identity  expressed  in 

/  J \{f,9{p’g))\2  dpdq  =  2T\\g\\2\\f\\2. 

The  resolution  of  the  identity  implies  that  given  the  function, 

($f)(p,q)={f,9(p'q))e  L2(IR2),  (2.2.5) 

we  can  recover  /  via  the  reconstruction  formula, 

f=ijj  {f,gM)gMdpdq.  (2.2.6) 

The  mapping  $  :  L2(IR)  — >  L2(IR2)  as  defined  in  (2.2.5)  is  often  referred  to  as  the 
continuous  Wey l- Heisenberg  wavelet  transform.  The  resolution  of  the  identity  tells  us 
that  $  is  an  isometric  mapping  up  to  a  constant  factor  from  L2(1R)  to  L2(IR2)  .  And 
the  reconstruction  formula  (2.2.6)  defines  for  us  the  inverse  transform. 

Remark:  The  name  Wey  1- Heisenberg  is  used  in  reference  to  the  fact  that  the  op¬ 
erations  of  modulation  and  translation  arise  via  the  action  of  the  left-regular  repre¬ 
sentation  of  the  Weyl-Heisenberg  group  on  L2(IR)  .  Affine  wavelet  transforms  which 
we  introduce  in  the  next  section  are  also  associated  with  the  representation  of  a  Lie 
group  on  L2(ffi.)  ,  namely  the  affine  (or  ax  +  b )  group. 

Clearly  the  continuous  Weyl-Heisenberg  wavelet  transform  ($/)(p,  q)  =  l^f,g[p'q)'KJ 
is  a  complete  and  ‘stable’  representation  of  any  function  /  £  L2(IR)  .  Consider  now 
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the  case  where  instead  of  allowing  p  and  q  to  vary  continuously,  we  select  a  discrete 
lattice  {(Pn,  <?«)},  and  define  the  mapping  on  L2(IR)  by 

^/={(/,<?{p"’?n)}}.  (2.2.7) 

In  this  case  the  mapping  is  known  as  the  discrete  Wey l- Heisenberg  wavelet  trans¬ 
form.  Thus  the  STFT  can  now  be  viewed  as  a  discrete  Weyl-Heisenberg  wavelet 
transform.  Of  course  there  remains  the  question  of  whether  knowledge  of  the  se¬ 
quence  g(pn,qn^ |  is  sufficient  to  recover  the  function  /  in  a  stable  manner.  As 

we  will  see  in  Section  2.2.3  the  answer  to  these  questions  rely  on  selection  of  the  dis¬ 
crete  lattice  {(pn,qn)}  such  that  the  sequence  of  functions  forms  &  frame  for 

L2(1R)  . 

2.2.2  Affine  Wavelet  Transforms 


Modulations  and  translations  as  described  in  the  last  section  are  not  the  only  means 
of  generating  a  family  of  time-frequency  localized  functions  from  a  single  suitable 
analyzing  function.  Let  us  consider  instead  the  actions  of  dilation  and  translation 
provided  by  the  left-regular  representation  of  the  affine  (ax  +  b )  group  on  L2(IR)  .  Let 
Da  denote  the  dilation  operator  on  L2(IR)  defined  by  , 

( Daf)(x )  =  a1/2f(ax),  a±  0, 


and  define 


h^a,b\x)  =  ( DaTf,h)(x )  =  al/2h(ax  —  b ). 


In  the  Weyl-Heisenberg  case,  there  were  no  restrictions  placed  on  the  analyzing  func¬ 
tion  in  deriving  the  resolution  of  the  identity.  In  the  affine  case  however,  we  are 
forced  to  consider  only  analyzing  functions  h ,  which  satisfy  the  following  admissibility 
condition, 


Ch=  f 
J  R 


\h(u) 


-du  <  oo. 


(2.2.8) 

/] R  M 

Any  h  £  L2(IR)  satisfying  the  admissibility  condition  (2.2.8),  is  called  an  admissible 
analyzing  wavelet  (c.f.  [HW89,  Dau90]). 
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Remark:  Note  that  for  any  function  h  with  sufficient  decay  at  oo,  the  admissibility 
condition  is  equivalent  to  requiring  that  f  h(x)dx  =  0. 

Given  an  admissible  analyzing  wavelet  h  £  L2(]R)  ,  there  exists  once  again  a 
resolution  of  the  identity  which  may  be  expressed  as 

{f'g)  =  kl  I 

for  all  /, g  £  L2(1R.)  .  Analogous  to  the  Weyl-Heisenberg  case,  we  can  define  an 
operator  W  :  L2(1R)  — ♦  L2(IR2)  by, 

C Wf)(a,b)=(f,h (2.2.9) 

The  mapping  W  defined  in  (2.2.9)  is  called  the  continuous  (affine)  wavelet  trans¬ 
form,  and  furthermore  the  resolution  of  the  identity  in  this  case  gives  the  following 
reconstruction  formula, 

/  =  ^  /  /  (/,  h^)  (2.2.10) 

Remark:  Note  that  in  Equation  (2.2.10)  the  measure  used  for  integration  is  now 
a~2dadb. 

Once  again  as  in  the  Weyl-Heisenberg  case  we  can  consider  a  discrete  lattice 
{an,  bn}  and  define  the  corresponding  discrete  wavelet  transform  W&  on  L2(IR)  by, 

^/  =  {(/,^(an’M)}.  (2.2.11) 

Here  also  we  need  to  ask  the  questions  regarding  completeness  and  stability  of  the 
sequence  j^/, /i(an’6n)^)j. 

2.2.3  Bases  and  Frames  in  Hilbert  Spaces 

By  definition,  an  orthonormal  basis  for  a  separable  Hilbert  space  H  is  a  sequence 
{hn} C  Ti.  of  normalized  (||  •  ||  =  1)  vectors,  which  is  complete  in  the  sense  that  the 
closed  linear  span  of  {hn}  is  dense  in  7 i,  and  furthermore  the  vectors  are  mutually 
orthogonal,  i.e. 
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A  key  property  of  an  orthonormal  basis,  known  as  Parseval’s  identity,  is  that  the 
sequence  of  (Fourier)  coefficients  obtained  via  inner-products  of  f  EH  and  the  basis 
elements  hn  is  such  that, 

E  KAMI2  =  11/11^- 

n 

Hence  the  sequence  of  Fourier  coefficients  with  respect  to  an  orthonormal  basis 
form  a  square-summable  sequence.  Furthermore  by  the  Riesz- Fischer  theorem  we 
know  that  every  square-summable  sequence  may  be  obtained  in  this  manner.  Par¬ 
seval’s  identity  together  with  the  Riesz-Fischer  theorem  shows  that  the  operator 
T  :  L2(IR)  — ►  £2(2Z2),  r  :  /  hf  {(/,  hn)}  is  a  unitary  operator.  Hence,  given  an 
orthonormal  basis  {hn}  for  a  Hilbert  space  Ti,  we  have  the  following  Fourier  expan¬ 
sion  of  any  f  GH, 

f  =  52(f,hn)hn.  (2.2.12) 

n 

Thus  for  an  orthonormal  basis  {hn},  the  sequence  of  Fourier  coefficients  {{/.  hn)}  is 
a  complete  and  stable  representation  of  /. 

Orthonormal  bases  are  not  the  only  sequences  with  this  property.  If  we  relax  the 
requirement  of  orthogonality  and  normalization,  while  retaining  the  weaker  property 
of  linear  independence,  we  can  consider  sequences  which  are  know  as  Riesz  bases  (c.f 
[You80]). 

Definition  2.3  A  sequence  of  vectors  {hn} C  TC  is  called  a  Riesz  basis  if  the  sequence 
is  complete  in  TC  and  there  exist  two  constants  0  <  C'i  <  C-i  <  oo,  such  that  for 
arbitrary  positive  m  and  arbitrary  scalars  {ai, . . . ,  an}, 

mm  m 

caEKI2<ilE^M2<^EKI2- 

71=1  71=1  71=  1 

Equivalently, 

(1)  The  vectors  {hn}  are  linearly  independent  and, 

(2)  There  exist  constants  0  <  .4  <  B  <  oo  such  that  for  any  f  £  Tt, 

A\\ff  <J2\(f-hn)f  <  B\\f\}2. 
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Given  a  Riesz  basis  {hn}  for  a  Hilbert  space  H ,  the  Hahn-Banach  theorem  asserts 
the  existence  of  a  unique  biorthogonal  sequence  {gn}  ({ hn,gm )  =  6m^n)  such  that  for 
every  /  G  H, 

f  =  (/»  M  9n  =  Yl  {/>  9n)  K- 

n  n 

Hence  in  the  case  of  Riesz  bases  as  well,  the  sequence  {(/,  hn)}  is  a  complete  and 
stable  representation  of  any  /  6.H. 

The  notion  of  frames  is  yet  another  level  of  generalization  of  orthonormal  bases 
where  even  linear  independence  can  be  sacrificed  while  still  retaining  the  properties 
of  completeness  and  stability.  Note  that  in  sacrificing  linear  independence,  strictly 
speaking,  we  will  not  even  have  a  (Schauder)  basis. 

Frames,  which  were  first  introduced  by  Duffin  and  Schaeffer  in  [DS52],  are  defined 
as  follows. 

Definition  2.4  Given  a  Hilbert  space  H,  a  sequence  of  vectors  {hn}%L_ ^  C  H,  is 
called  a  frame  if  there  exist  constants  A  >  0  and  B  <  oo  such  that 

A||/||2<El</^n>|2<5||/||2,  (2.2.13) 

n 

for  every  f  G  .  A  and  B  are  called  the  frame  bounds. 

Remarks: 

(a)  A  frame  {hn}  with  frame  bounds  A  =  B  is  called  a  tight  frame. 

(b)  A  frame  {/rn},  which  ceases  to  be  a  frame  upon  removal  of  a  single  element,  is 
called  an  exact  frame. 

(c)  Every  orthonormal  basis  is  a  tight  exact  frame  with  A  =  B  =  1. 

(d)  A  tight  frame  of  unit-norm  vectors  for  which  A  =  B  =  1  is  an  orthonormal 
basis. 
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Given  a  frame  {/in}  in  the  Hilbert  space  H ,  with  frame  bounds  A  and  B ,  define 
the  operator  T  :  L2(IR)  —»■  l2  by 

17  =  {</,/>«}}.  (2.2.14) 

Remark:  Note  that  in  general  due  to  the  redundancy  allowed  by  not  requiring 
linear  independence,  the  range  of  the  operator  T  will  not  be  all  of  i2,  but  instead 
a  closed  proper  subspace  of  l2.  In  Section  2.3.1  we  describe  how  this  redundancy 
is  useful  in  providing  ‘robust’  representations.  Here  robustness  refers  to  the  fact 
that  the  reconstruction  error  induced  by  perturbations  of  the  coefficients  {(/,  hn)},  is 
diminished  due  to  a  projection  operation  onto  the  range  of  T. 

Definition  2.5  Given  a  frame  { hn }  for  a  Hilbert  space  Tt,  the  frame  operator  S  : 
H  — >  hi  is  defined  as  S  =  FT,  where  F*  is  the  adjoint  of  F.  Thus  for  any  f  £  H, 

Sf  =  'Z<f,hn>hn.  (2.2.15) 


Some  properties  of  the  frame  operator  which  follow  from  the  definition  of  a  frame 
are  collected  together  in  the  following  theorem. 


Theorem  2.3  ([DS52])  (1)  S  is  a  bounded  linear  operator  with  AI  <  S  <  BI, 

where  I  is  the  identity  operator  in  hi. 


(2)  S  is  an  invertible  operator  with  B~  ~lI  <  S-1  <  A"1/. 

(3)  Since  AI  <  S  <  BI  implies  that  \\I  -  ^g5||  <  1,  5"1  can  be  computed  via  the 
Neumann  series, 


S~lg  = 


A  +  B 


k= 0 


Eh- 


2 


A  +  B 


k 

S\  g. 


(2.2.16) 


(4)  The  sequence  {S~lhn}  is  also  a  frame,  called  the  dual  frame,  with  frame  bounds 
f?_1  and  A-1 . 
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By  the  above  properties  of  the  frame  operator,  we  have  the  following  representation 
theorem  for  H  with  respect  to  a  frame  or  associated  dual  frame. 

Theorem  2.4  ([DS52])  Given  any  f  £  7t,  f  can  be  decomposed  in  terms  of  the 
frame  ( or  dual  frame )  elements  as 

f  =  Y^<  f>S~lhn  >  K  =  <  f,hn  >  (2.2.17) 


In  general  frame  representations  are  not  unique,  i.e.  it  is  possible  that  that  sequence 
{cn}  =  {(/, 5_1hn)}  is  not  the  only  sequence  of  coefficients  such  that  /  =  £)cnhn. 
However  the  following  theorem  shows  that  any  other  sequence  of  expansion  coefficients 
must  be  related  to  the  sequence  {(/,  5_1hn)}  via  a  Pythagorean  relation. 

Theorem  2.5  ([DS52])  Given  f  £  hi,  if  there  exists  another  sequence  of  coefficients 
{an}  (other  than  the  sequence  {<  /,  S~lhn  >}j  such  that  f  =  then  the  an ’s 

are  related  to  the  coefficients  given  in  (2.2.17)  by  the  formula, 

la™|2  =  I  <  f'  S~lhn  >  I2  +  I  <  S~lhn  >  -an\2-  (2.2.18) 


Remark:  Note  that  by  Theorem  2.5,  the  sequence  of  coefficients  obtained  via  the 
inverse  frame  operator  is  optimal  in  the  sense  of  minimum  t'2  norm. 

2.2.4  Discrete  Wavelet  Transforms  and  Frames 

Armed  with  the  theory  of  frames,  it  is  now  possible  to  discuss  completeness  and  stabil¬ 
ity  of  the  discrete  wavelet  transforms  ^  and  Wd  defined  in  Section  2.2.  Simply  stated, 
it  is  necessary  to  select  the  discrete  lattices  {(pn,<7n)}  and  {(an,  bn)},  in  such  a  man¬ 
ner  that  the  sequences  |  and  respectively  form  frames  for  L2(IR)  . 

As  this  dissertation  deals  almost  exclusively  with  affine  wavelets,  we  restrict  discus¬ 
sion  to  the  affine  case  and  make  some  remarks  regarding  discrete  Weyl- Heisenberg 
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wavelets  at  the  end  of  this  section.  In  the  affine  case  we  restrict  attention  to  regular 
discrete  lattices  in  which  discretization  of  dilations  is  treated  exponentially.  That  is, 
we  consider  discrete  lattices  of  the  form  {(a.™7nb0)}mne%  for  fixed  constants  a0  >  1 
and  bo  >  0.  We  note  that  it  is  not  necessary  to  restrict  to  lattices  of  this  form  and 
in  general  even  the  requirement  of  regular  lattices  is  not  necessary.  We  refer  readers 
interested  in  more  general  discretizations  to  [FG]  in  which  frames  are  constructed  by 
utilizing  the  underlying  group  structure. 

Given  ao  >  1,  £>o  >  0,  and  an  admissible  analyzing  wavelet  ^  £  L2(1R)  define, 

i>m,n(x)  ~  (Da™Tnboi>)(x)  =  a^V^o* X  -  nb0).  (2.2.19) 

Daubechies  in  [Dau90]  studied  the  problem  of  constructing  frames  of  the  form  {i/’m, n}. 
Under  very  mild  additional  hypotheses  on  the  analyzing  wavelet  -0,  it  is  possible  to 
determine  values  of  the  dilation  stepsize  ao  and  translation  stepsize  bo  such  that  the 
family  of  functions  is  a  frame  for  L2(1R)  .  In  this  case  we  say  that  ,a0,  &o ) 

generates  an  affine  frame  for  L2(IR)  .  Given  an  admissible  mother  wavelet  ip  G  L2(IR)  , 
the  following  theorem  by  Daubechies  [Dau90]  can  be  used  to  numerically  determine 
values  of  the  parameters  ao  and  bo  for  which  (^,  ao,  bo)  generates  an  affine  frame  for 
L2(ffi.)  . 

Theorem  2.6  ([Dau90])  Let  h  £  L2(1R)  and  a  >  1  be  such  that: 


(1) 


m(h ;  a)  =  ess  inf  |h(amu;)|2  >  0 
Me[i,a] 


(2.2.20) 


(2) 


M(h\  a)  =  ess  sup  \h(amu)\2  <  oo  (2.2.21) 

|w|S[l,a]  m 


(3) 

OO 

lim  2  Y'  f3(2Trklb)^2(3{-2TTk/b)^'2  =  0,  (2.2.22) 

6~>° 

where 


/3(s)  =  ess  sup  'Y  \h(amu!)\\h(amuj  —  .s)|. 

|^|6[l»a]  m 
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Then  there  exists  Bc  >  0  such  that  {h,  a ,  b )  generates  an  affine  frame  for  each  0  <  b  < 

Bc. 


Proof:  See  Appendix  A 

The  following  is  a  useful  corollary  of  the  above  theorem. 

Corollary  2.1  ([Dau90])  Ifh  £  L2(IR)  and  a  >  1  satisfy  the  hypotheses  of  Theorem 
2.6  then, 


OO 

Bc>bc  =  inf  {6|  m(h;  a)  -  2  ^  P(2TTk/b)1^2P(-2Trk/b)1^'2  <  0} 

fc=i 


and  for  0  <  b  <  bc,  the  frame  bounds  A  and  B  can  be  estimated  as, 


A  > 

B  < 


b  1  (  m(h;  a)  —  2  y)  /?(27rfc/6)1/'2/3( — 27tA;/6) 


1/2 


k=l 

oo 


b  1  lM(h;a) +  2^2(3(2-Kk/b)1/2l3(—2Trk/b)1/2 

V  fc=i  J 


(2.2.23) 


(2.2.24) 


Given  an  affine  frame  {i>m,n}  for  L2(IR)  ,  it  is  clear  that  the  operator  F  defined 
in  Section  2.2.3  is  precisely  the  discrete  wavelet  transform  operator  Wj,  (see  (2.2.11)). 
Hence  by  the  frame  property,  W&  is  a  bounded  operator  from  L2(IR)  to  f2(Z2).  Fur¬ 
thermore,  using  the  frame  operator  S  =  W^Wj,  we  get  the  following  wavelet  decom¬ 
position  of  L2  (IR)  : 

/=EE(/-rl^«)^»-  (2.2.25) 

and  the  corresponding  dual  frame  decomposition, 

f=  £  £  </,t/w>S~Vm,n.  (2.2.26) 

Since  the  frame  property  assures  that  the  frame  operator  S  is  bounded  and  has 
bounded  inverse,  we  see  that  in  the  case  where  is  a  frame,  Wdf  is  a  complete 
and  stable  representation  of  any  /  £  L2(IR)  . 
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Remarks:  Our  discussion  in  this  section  has  been  limited  to  the  general  setting  of 
frames  of  affine  wavelets.  It  is  however  possible  to  construct  orthonormal  bases  of 
affine  wavelets  (c.f.  [Dau88a])  and  also  biorthogonal  systems  (Riesz  bases)  of  affine 
wavelets  (c.f  [Chu92]).  However  the  constructions  of  orthogonal  or  even  biorthogonal 
wavelets  are  too  restrictive  for  the  applications  considered  in  this  dissertation. 

Remarks  on  Discrete  Weyl-Heisenberg  Wavelets 

In  the  Weyl-Heisenberg  case  as  well,  a  theorem  similar  to  Theorem  2.6  is  given  in 
[Dau90].  The  regular  discrete  lattices  for  the  Weyl-Heisenberg  wavelets  are  such 
that  both  modulations  and  translations  are  treated  linearly;  i.e.  lattices  of  the  form 
{(mpo,  A  point  to  be  made  regarding  Weyl-Heisenberg  wavelets  is  that  it  is 

natural  to  define  the  time-frequency  density  in  terms  of  the  product  po%-  Further¬ 
more,  there  exists  in  this  case  a  critical  density,  such  that  the  family  of  functions 
|q(mpo’n?°)|  can  only  form  a  frame  if  the  product  poqo,  is  less  than  or  equal  to  this 
critical  density.  For  po<Zo  greater  than  the  critical  density,  |qimpo’n9°)|,  will  be  incom¬ 
plete.  In  contrast  to  this,  there  exists  no  such  critical  density  in  the  affine  case,  and 
in  fact  it  is  not  possible  to  define  an  equivalent  notion  of  density  (c.f  [Dau90]). 

2.2.5  Time-Frequency  Localization  Properties  of  Wavelets 

Perhaps  the  most  useful  property  of  wavelet  decompositions,  is  the  property  of  time- 
frequency  localization.  To  illustrate  the  manner  in  which  time-frequency  localization 
arises  in  affine  wavelets,  recall  the  admissibility  condition  (2.2.8)  which  the  analyz¬ 
ing  wavelet  must  satisfy.  Satisfying  (2.2.8)  imposes  the  requirement  that  the  Fourier 

transform  of  the  analyzing  wavelet  ip  must  have  a  zero  at  w  =  0,  i.e.  V>(0)  =  0. 

~  2 

Furthermore,  the  admissibility  condition  requires  that  rp(u>)  — *■  0,  faster  than  u.  In 
addition,  given  smoothness  assumptions  on  ib  G  L2(IR)  ,  we  know  that  —*•  0  as 
u  — >  oo.  Thus,  an  admissible  analyzing  wavelet  must  approximately  mimic  the  behav¬ 
ior  of  a  ‘bandpass’  function.  The  required  decay  in  the  time  domain  approximately 
localizes  the  analyzing  wavelet  in  time.  In  general  of  course  there  are  only  moderate 
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constraints  on  the  decay  at  infinity  in  both  the  time  and  frequency  domains.  However, 
within  the  constraints  imposed  by  Heisenberg’s  inequality,  there  is  still  a  great  deal  of 
freedom  in  the  choice  of  analyzing  wavelets  with  reasonably  fast  decay  in  both  time 
and  frequency.  Since  an  affine  frame  is  constructed  via  translations  and  dilations  of 
the  analyzing  wavelet  V ;,  we  need  to  look  at  how  these  operations  affect  the  time- 
frequency  concentration  of  ip  .  Assume  the  analyzing  wavelet  ip  is  concentrated  on 
the  rectangle  of  time-frequency  defined  by, 

Qo,oO)  =  [*o(V0,*iW]  X  [u0(ip),ui(ip)] 

=  R(i>)  x  fi(i p). 

Recalling  the  dilation  property  of  the  Fourier  transform, 

f(ax )  -5-  a_1/(a_1w), 

we  see  that  the  translates  and  dilates  i/’m.n  are  concentrated  on  time-frequency  rect¬ 
angles  defined  by, 

Qm,n(V0  =  [%m(x0(ip)  +  nb0),%m{x1(ip)  +  nbo)}  X  [a™a.'0(», 

Thus  dilation  shifts  the  frequency  concentration  of  the  wavelets  on  a  logarithmic 
scale.  For  large  values  of  m  the  frequency  concentration  is  shifted  to  the  higher 
frequencies  and  as  m  decreases,  the  frequency  concentration  shifts  towards  zero.  This 
is  illustrated  by  Figure  2.2,  where  we  see  that  as  m  gets  large,  the  function  ip  gets 
narrower  and  thereby  includes  higher  frequency  components.  Translation  simply  shifts 
the  time  concentration  by  steps  proportional  to  aQm&o.  The  factor  a p™,  appearing  in 
the  translation  stepsize  accommodates  the  dilations  by  keeping  the  translation  steps 
small  for  large  m  (narrow  functions),  and  larger  for  large  m  (wide  functions). 

Windowed  Fourier  Transforms  vs.  Affine  Wavelets 

There  is  an  important  distinction  between  the  type  of  time-frequency  localization  aris¬ 
ing  in  the  Weyl- Heisenberg  case  and  the  type  of  time-frequency  localization  afforded 
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Figure  2.2:  (a)  Translations  in  the  time  domain,  (b)  The  effect  of  dilations  in  the 
time  domain  (c)  The  effect  of  dilations  in  the  frequency  domain 
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by  affine  wavelets.  In  the  Weyl-Heisenberg  case  the  ‘width’  of  the  window  function 
remains  constant,  and  it  is  via  the  exponential  modulation  that  higher  frequencies  are 
analyzed.  Exponential  modulation  corresponds  to  a  linear  shift  of  the  frequency  con¬ 
centration.  Thus  in  the  Weyl-Heisenberg  case  the  extent  of  localization  in  time  (and 
frequency)  remains  constant.  In  the  affine  case  higher  frequencies  are  analyzed  using 
narrower  functions.  More  precisely,  the  time  concentration  of  the  wavelet  shrinks  in 
size  as  m  gets  large.  This  feature  is  useful  in  ‘zooming  in  ’  on  details  of  a  signal. 
For  instance  in  image  analysis  if  one  were  trying  to  identify  the  location  of  an  edge, 
it  is  necessary  to  be  able  to  identify  a  high  frequency  component  which  is  sharply 
localized  in  time.  The  trade-off  is  that  in  the  frequency  domain,  the  frequency  con¬ 
centration  gets  wider  at  higher  frequencies.  Simply  stated  affine  wavelets  allow  us 
to  resolve  lower  frequencies  with  greater  accuracy  than  higher  frequencies.  In  many 
applications  this  is  a  very  natural  phenomena.  For  example,  processing  of  acoustic 
signals  by  the  human  cochlea  can  be  modeled  in  this  manner  (see  [YWS92]). 

2.3  Finite  Wavelet  Approximations 

In  practice  it  is  necessary  to  consider  only  finitely  many  terms  in  the  wavelet  decom¬ 
position.  This  leads  to  the  question  of  how  well  a  given  function  can  be  represented 
by  truncations  of  a  wavelet  decomposition.  In  general,  arbitrary  truncations  can  re¬ 
sult  in  arbitrarily  large  errors.  However  in  the  case  where  the  functions  of  interest 
are  well-localized  in  time-frequency,  it  is  possible  to  bound  the  approximation  error 
associated  with  particular  types  of  truncations.  Daubechies  in  [Dau90]  examined  this 
problem,  and  derived  upper  bounds  on  the  approximation  error  for  a  particularly 
important  class  of  truncations.  Suppose  we  are  interested  in  functions  /  which  are 
essentially  time-limited  to  [— T,  T]  and  essentially  band-limited  to  the  frequency  range 
[fl0,  fli]  (J[-fi0,  — fii]-  That  is,  we  assume  that  the  functions  /  of  interest  to  us  are 
such  that, 

11(1-  Qt)/H  <  €  and  11(1- (i-a,  -  Pu0))f\\  <  e.  (2.3.27) 
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Given  an  affine  frame  {fim,n}  where  the  analyzing  wavelet  is  itself  well  localized  in 
time-frequency,  we  can  ask  the  following  question:  for  any  given  e  >  0,  is  there  a  finite 
subset  Bc(T,Q,o,£li)  C  Z2  such  that  the  error  in  approximating  /  by 

f=  £  (2.3.28) 

(m,n)sBe(T, Qo,Qi ) 

is  less  than  e||/||.  The  answer  to  this  question  was  provided  by  Daubechies  in  [Dau90]. 
For  a  function  /  which  is  essentially  concentrated  (as  defined  by  Equation  (2.3.27)) 
on  the  rectangle 

Qf  =  [— T,  T]  x 

in  time-frequency,  it  is  possible  to  construct  an  enlarged  box  containing  Q /  such  that 
by  including  in  BC(T,  fl0,  fix)  all  indices  (m,  n)  corresponding  to  wavelets  i>m,n  with 
concentration  centers  in  this  enlarged  box,  the  error  in  approximating  f  by  f  (2.3.28) 
is  bounded  above  by  e.  A  precise  statement  of  this  result  is  in  the  following  theorem. 

Theorem  2.7  ([Dau90])  Let  be  a  frame  for  L2 (IR)  with  frame  bounds  A  and 

B,  and  associated  dual  frame  {S-1  hm<n} .  Assume  that 

K(u>)|  <  C\uf  {l  +  u2)-(a+M\ 

with  (3  >  0  and  a  >  1  .Also  assume  that  for  some  7  >  1/2, 

/(!  +  x2y  \h(x)\2  <  00. 

Now  if  we  choose  T  >  0,  and  0  <  fio  <  then  for  any  e  >  0,  there  exists  a  finite 
subset  Be(T,Qo,Sli)  C  Z2,  such  that  for  all  f  G  L2(IR)  , 

II/-  {f,hmyn)  S~lhmy7l || 

(m,n)es£(T, no.no 

<  (f) 1/2  [11(1  -  <Jr)/ll  +  11(1  -  Pa,  +  +  <11/10 
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Remarks:  The  error  bounds  of  Theorem  2.7,  are  derived  using  the  dual  frame  re¬ 
construction  formula  and  cannot  trivially  be  extended  to  the  frame  reconstruction 
formula.  This  is  primarily  because  in  general  the  dual  frame  elements  will  not  be 
generated  as  translates  and  dilates  of  a  single  function  and  may  have  localization 
properties  different  from  those  of  the  original  wavelets.  However  we  note  that  in  the 
case  where  the  frame  is  tight  (A  =  B),  Theorem  2.7  as  stated  above  is  also  applicable 
for  the  frame  reconstruction  formula  since  in  this  case,  the  frame  operator  is  a  multiple 
of  the  identity.  In  particular,  for  a  tight  frame  {/,  5_1/iTO,ra )  =  A~1  {/,  hm,n). 

2.3.1  Robustness  of  Frame  Representations 

As  noted  in  [Dau90],  time- frequency  localized  frame  representations  are  in  general 
quite  robust  against  perturbations  of  the  coefficient  values.  This  insensitivity  can  be 
attributed  to  both  time-frequency  localization  and  the  redundancy  of  frames.  Time- 
frequency  localization  helps  by  localizing  errors  due  to  particular  coefficients;  i.e.  in 
any  given  region  of  time-frequency,  errors  in  the  coefficient  values  cannot  all  contribute 
equally  to  errors  in  the  frame  representation. 

Robustness  due  to  redundancy  of  the  frame  can  be  best  described  in  terms  of  the 
range  of  the  operator  T  defined  in  Section  2.2.3.  As  noted  earlier  (c.f.  remark  following 
Equation  2.2.14),  the  closed  range  of  F,  for  a  frame  whose  elements  are  not  linearly 
independent,  is  a  proper  subspace  of  i2{7L).  Therefore  if  the  frame  reconstruction 
formula  (Theorem  2.4)  is  applied  to  vectors  c  E  i2,  which  are  not  necessarily  in  the 
range  of  T,  i.e.  if  we  consider  the  formula, 

^  ^  CnS  hn, 
n 

then  this  consists  of:  (1)  a  projection  of  £2onto  Ran  T,  and  (2)  inverting  F  on  its 
range.  Since  errors  in  the  coefficients  will  in  general  ‘live’  on  all  of  l'2,  application 
of  the  reconstruction  formula  will  reduce  the  norm  of  error  component  due  to  the 
projection  onto  Ran  T.  The  greater  the  redundancy  of  the  frame,  the  smaller  the 
range  of  F.  Hence  for  very  redundant  frames,  this  reduction  in  the  effects  of  errors  is 
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more  pronounced. 

The  robustness  of  redundant  frame  representations  is  a  very  useful  property  when 
dealing  with  numerical  computation  of  wavelet  decompositions,  as  the  effects  of  nu¬ 
merical  errors  are  thus  reduced. 


2.4  Computing  Wavelet  Frame  Decompositions 


In  principle,  given  an  affine  frame  {V'm.n},  and  a  function  /  €  L2(1R)  ,  the  coefficients, 

C,n,n(/)  = 


may  be  computed  by  first  computing  the  elements  of  the  dual  frame  {S~li/>min}  using 
the  Neumann  series  expansion  for  the  inverse  frame  operator  S~ 1  and  then  computing 
the  inner  products  of  /  with  elements  of  the  dual  frame.  However  in  practice,  this 
may  prove  to  be  an  extremely  cumbersome  computation,  especially  in  cases  where  the 
frame  is  not  close  to  being  a  tight  frame  ( B/A  —  1).  By  the  frame  property,  we  know 


that 


II- 


2 


;S\\< 


B/A  -  1 


A  +  B~"  B/A  +  1 ' 

Hence  the  rate  of  convergence  of  the  Neumann  series  (2.2.16),  is  governed  by  how 

‘snug’  the  frame  is  (i.e.  by  how  close  B/A  is  to  1). 


Computation  via  Sampled  Convolutions 

In  the  case  of  a  tight  frame,  the  computation  is  greatly  simplified  since  in  this  case 

Furthermore  the  inner  products  (/,  are  easily  computed 

by  noting  that  these  are  simply  regular  samples  of  convolutions.  More  precisely, 

(/,VW i)  =’  [  f{x)a,Q/2'ip(a%lx  -  nb0)dx 

Jn 

=  (/  *  Vvo)  ( nbo ), 

where  ^>~,o  denotes  the  reflection  of  i>mfi  about  the  origin,  i.e.  =  /4’m,o{-x). 

Hence  using  the  convolution  theorem  for  Fourier  transforms,  {(/,  's  easily 

computed. 
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The  above  formulation  may  also  be  used  to  simplify  computation  in  the  case  of 
frames  which  are  not  tight.  This  can  be  done  by  using  the  fact  that  the  frame  operator 
5  (and  therefore  S-1)  is  self-adjoint,  to  write, 

Thus  the  inverse  frame  operator  series  need  only  be  applied  once  to  compute  5-1/, 
and  then  the  coefficients  may  be  computed  as  samples  of  convolutions  as  described 
above. 

Normal  Equations 

Expansions  with  respect  to  frames  are  in  general  not  unique  due  to  the  lack  of  linear  in¬ 
dependence.  It  can  be  seen  however,  from  Theorem  2.5  that  among  all  possible  expan¬ 
sion  coefficients  for  a  given  function  /,  the  coefficient  sequence  c(f)  =  {(/,  5-1i ipm,n)} 
is  optimal  in  the  sense  of  minimum  £2(Z2)  norm.  That  is,  if  {am>n}  is  such  that 

f  —  'y  '  y  ' 

m  n 

then, 

E  £  |  (/>  •S'-Vm.n)  |2  <  E  £  K,»|2  • 

m  n  m  n 

This  property  may  be  utilized  to  formulate  the  coefficient  computation  problem  as  a 
constrained  optimization  problem.  Namely, 

minimize  EEK.,12 

m  n 

such  that  /  =  EE 

77i  n 

Consider  now  the  case  of  a  finite  wavelet  approximation  to  /, 

f  ~  f  —  y  'j  ^-rn, 71^771^71  •  (2.4.29) 

Let  Span{/i„}  denote  the  closed  linear  span  of  the  vectors  {hn}.  It  is  clear  that 
/  can  be  represented  exactly  by  the  expansion  in  (2.4.29)  if  and  only  if  f  6 
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Span{Vw,  ( m,n )  £  X},  where  Span{/in},  denotes  the  closed  linear  span  of  the  vec¬ 
tors  {hn}  .  If  f  g  Span{^mn,  (m,n)  6  1}  then  the  ‘best’1  approximation  to  /  in 
terms  of  the  finite  subset  of  frame  elements  with  indices  in  X  is  the  projection  of  / 
onto  Span {^mn,  (m,  n)  £  X}.  In  this  case,  we  would  like  to  compute  the  coefficients 
of  expansion  of  the  projection  of  /  onto  Span{^mTl,  ( m,n )  El}.  It  can  easily  be 
shown  that  any  finite  number  of  vectors  form  a  frame  for  their  span  (c.f.  [Pat91]).  In 
this  case  we  can  reformulate  the  optimization  problem  stated  above  as, 


minimize  X  l^m^n  | 

(m,n)eX 

such  that 

||/  —  'y  }  &m,n',Pm,n\\  min  ||  f  —  ^  }  ®-m,n 1 1  • 

(m,n)eX  (m,n)eX 

(2.4.30) 


The  optimization  problem  defined  by  (2.4.30)  is  convex.  Hence,  one  means  of  solving 
(2.4.30)  is  by  direct  optimization. 

Consider  now  the  normal  equations  associated  with  the  problem  of  minimizing 
11/  ~  HfcLi  Cfc/tjc |p.  The  normal  equations  are  obtained  via  the  first-order  optimality 
condition, 

d  N 

^~\\ /-X>M2  =  0’  k  =  l,...,N, 

OCk  k= l 

and  may  be  written  as, 

P  C  =  W  (2.4.31) 


where,  P  is  the  N  x  N  matrix  defined  by, 


P  =  [Pkj]  =  [(hk,hj)],  (2.4.32) 

W  =  [(f,h1),...,(f,hN)}T,  (2.4.33) 

and  C  =  [ci, . . . ,  cjv]  is  the  vector  of  coefficients. 

1With  respect  to  the  L2(fft)  norm. 
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Suppose  now  that  we  are  searching  for  the  minimum  norm  solution  to  the  normal 
equations  (2.4.31).  It  is  well-know  that  the  Moore-Penrose  generalized  inverse  P t  = 
(P* P)-1  P* ,  generates  the  minimum  norm  solution  to  (2.4.31),  as 

C  =  P^W, 

whenever  the  (P*P)-1,  exists.  In  cases  where  the  problem  is  very  ill-conditioned, 
stabilizing  techniques  such  as  singular  value  decomposition  may  be  used  to  compute 
the  generalized  inverse. 

Thus  the  solution  to  the  normal  equations  via  the  generalized  inverse  provide 
another  means  of  computing  the  frame  decomposition  coefficients.  For  numerical  so¬ 
lutions  where  we  are  dealing  with  discrete  samples  of  the  functions  the  inner  products 
in  the  definition  of  the  normal  equations  may  be  approximated  by  sums. 

Remarks:  In  searching  for  the  minimum  norm  sequence  of  coefficients  as  above, 
where  only  a  finite  number  of  coefficients  are  involved,  technically  we  are  no  longer 
computing  the  coefficients  cm>n  =  (/,  5-1V’m,n)  with  respect  to  the  original  frame 
{V’m,n}mn g£-  In  fact  what  we  are  doing  is  computing  the  frame  expansion  of  the 
orthogonal  projection  of  /  onto  H  =  Span{t/>TOn,  ( m,n )  6  I},  with  respect  to  the 
frame  {ipmn,  ( m,  n )  £  X},  for  H.  Hence,  in  general  the  set  of  coefficients  {ck}k=i  will 
not  agree  with  the  corresponding  coefficients  computed  with  respect  to  the  infinite 
frame.  In  cases  where  the  index  set  J,  is  large  and  such  that  {V’mn?  (m,n)  6  J}, 
covers  ‘most’  of  the  time-frequency  concentration  of  /,  the  error  between  the  two  sets 
of  coefficients  will  be  small.  The  advantage  to  computing  the  coefficients  with  respect 
to  the  new  (finite)  frame  is  that  in  general  the  quality  of  approximation  will  be  better 
than  if  a  truncation  of  the  infinite  frame  expansion  were  used  (see  Appendix  C). 
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Chapter  3 


Rational  Wavelet 


Decompositions  of  Stable  Linear 
Systems 


In  this  chapter  we  study  wavelet  decompositions  of  the  Hardy  space  H2(n+),  (where 
n+  denotes  the  half-plane  -Re  s  >  0)  via  real-rational  analyzing  wavelets.  The  re¬ 
striction  of  rationality  of  the  analyzing  wavelet  leads  us  to  consider  affine  frames  for 
H2(n+).  The  space  H2  was  in  fact  considered  among  the  first  investigations  of  con¬ 
tinuous  wavelet  transforms  [GM84].  With  the  recent  development  of  multiresolution 
analyses  and  the  resulting  orthonormal  bases  of  wavelets  for  L2(1R)  ,  attention  has 
shifted  primarily  to  the  study  of  wavelet  transforms  on  L2(IR)  .  Our  use  of  frames 
rather  than  orthonormal  systems,  is  further  motivated  by  recent  results  which  have 
shown  that  there  do  not  exist  any  ‘nice’  orthonormal  wavelet  bases  for  H2(n+)  as¬ 
sociated  with  a  mutiresolution  analysis.  This  result  is  discussed  in  further  detail  in 
Section  3.6. 

Of  particular  interest  here,  are  functions  in  H2(n+)  arising  as  Laplace  transforms 
of  real- valued  functions  in  L2(0,  oo)  ,  as  these  include  transfer  functions  of  stable, 
causal,  linear  time-invariant  systems.  We  use  the  notation  Hj.(II+)  for  this  class  of 
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functions.  In  Section  3.3,  it  is  shown  that  by  starting  with  a  real-rational  analyzing 
wavelet  and  appropriately  grouping  terms  in  a  wavelet  decomposition,  functions  in 
Hm(n+)  may  be  represented  as  infinite  sums  of  time-frequency  localized,  real-rational 
functions  of  bounded  McMillan  degree.  We  refer  to  such  representations  as  ,  wavelet 
system  (WS)  decompositions  of  Hjl(II+).  One  of  the  main  results  of  this  chapter  is 
Theorem  3.3,  which  shows  the  existence  of  such  real-rational  (WS)  decompositions. 
A  real-rational  function  in  H2(II+)  can  of  course  be  viewed  as  the  transfer  function 
of  a  stable,  finite-dimensional,  causal,  linear  time  invariant  system.  Thus  wavelet 
system  decompositions  are  representations  of  (possibly  infinite-dimensional)  systems, 
as  infinite  parallel  connections  of  time-frequency  localized  finite-dimensional  systems. 
Such  representations  are  useful  in  constructing  finite- dimensional  approximations  to 
systems.  In  Section  3.5  we  consider  the  problem  of  constructing  a  causal  rational  ap¬ 
proximation  to  a  causal  nonrational  transfer  function  by  selecting  a  finite  number  of 
terms  from  the  WS  expansion.  For  dynamical  systems  with  transfer  functions  which 
are  well-localized  in  time-frequency,  this  approach  can  generate  low-order  approxi- 
mants.  Some  aspect  of  WS  decompositions,  including  minimal  state-space  realizations 
are  investigated  in  3.3. 

3.1  Background  on  Hardy  Spaces 

In  this  section  we  review  some  basic  properties  of  the  Hardy  spaces  Hp(n+)  where 
n+is  the  right-half  complex  plane  5?e  s  >  0.  For  a  more  complete  exposition  of  the 
properties  of  these  spaces  see  [Hof88,  Dur70,  Gar81j. 

Definition  3.1  Given  a  function  F  which  is  analytic  in  n+,  F  is  said  to  belong  to 
the  class  Hp(n+),  0  <  p  <  oc  if 

/oo 

\F(x  +  iy)\p  dy  <  oo.  (3.1.1) 

-oo 

A  function  F  which  is  analytic  in  II+;  is  said  to  belong  to  class  Hco(II+)i/, 

sup \F{x  +  iy)\  <  oo.  (3.1.2) 

ar>0 
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For  1  <  p  <  oo,  Hp(n+)  is  a  Banach  space  with  norm  defined  by  (3.1.1)  for  p  <  oo 
and  by  (3.1.2)  for  p  =  oo. 

Some  of  the  most  basic  properties  of  HP(II+)  are  captured  by  the  following  theorem 
(c.f.  [Hof88]). 

Theorem  3.1  Given  F  £  HP(II+),  the  following  are  true: 

(1)  The  nontangential  limit  of  F  exists  at  almost  every  point  on  the  imaginary  axis. 

(2)  The  boundary  value  function  of  F  is  in  IF  (JR.)  and, 

F(x  +  iy)  =  -  f  F(iu) . yTT~ - \2du'  x  >  0 

f  i] R  xl  +  (y  -  u>Y 

(3)  The  functions  Fx(y)  =  F( x  +  iy )  converge  in  Lp  norm  to  F{iy )  as  x  — »  0. 

Note:  The  boundary  value  function  (c.f.  [Gar81])  of  F  6  HP(II+)  is  defined  via  the 
limit  F*(iu),  of  F(s)  as  s  ^  iu,  nontangentially  i.e. 

F*(iu>)  =  lim  F(s),  a  >  0, 

Aa  (iui)9s—*iui 

where  for  a  >  0,  A a(iu)  is  the  cone  defined  by, 

Aa(iu! )  =  {s  =  x  +  iy  £  Il+  :  \y  —  w|  <  ax}  . 

Of  particular  interest  in  the  context  this  dissertation  is  the  space  H2(II+).  The 
space  H2(n+)  is  a  Hilbert  space  with  the  inner  product  defined  by, 

/OO  _ 

F{iu)G{iu)du . 

-OO 

One  can  view  H2(n+)  as  a  space  whose  elements  are  transfer  functions  of  causal  input- 
output  stable  linear  systems.  This  statement  is  precisely  captured  by  the  following 
classical  result  of  Paley  and  Wiener. 
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Theorem  3.2  (Paley-Wiener)  A  complex-valued  function  F  is  in  H2(II+)  if  and 
only  if, 

Jr  co 

1  f(t)e~stdt, 
o 

for  some  f  £  L2(0,oo)  .  Furthermore  this  representation  is  unique. 

Thus  by  the  above  theorem,  H2(II+)  arises  as  the  Laplace  transform  of  square- 
integrable  functions  on  the  half-line  [0,oo).  For  any  F  £  H2(II+)  define  the  restriction 
of  F  to  vertical  lines  in  the  right  half-plane  by, 

Fx(y)  =  F(x  -1-  iy),  x  >  0. 

Also  define  the  Fourier  transform  along  vertical  lines  in  II+by, 

Fx{u)  =  [  F(x  +  iy)eiuydy 

2ir  J] r 

for  F  £  H2(n+).  Note  that  this  definition  of  the  Fourier  transform  on  H2(II+)  actually 
corresponds  to  our  previous  definition  of  the  inverse  Fourier  transform.  By  Theorem 
3.1,  we  know  that  boundary  values  of  functions  in  H2(II+)  are  in  L2(1R)  .  However, 
since  by  the  Paley-Wiener 

-7-  [  F(iu)elwtdu  =  0  for  t  <  0,  (3.1.3) 

27T  Jr 

boundary  values  of  functions  in  H2(n+)  comprise  a  subspace,  denoted  H+(1R),  of 
L2(1R)  characterized  by  the  Fourier  transform  vanishing  on  the  negative  half-line 
(3.1.3).  It  can  also  be  shown  that  H^.(IR)  is  a  closed  subspace  of  L2(IR)  .  The  orthog¬ 
onal  complement  of  H+(IR)  in  L2(1E)  ,  denoted  Hi(IR),  consists  of  functions  whose 
Fourier  transforms  vanish  on  the  positive  half- line.  Elements  of  Hi  (IR)  are  boundary 
values  of  functions  in  H2(n~)  where  n_  is  the  half-plane  3ie  s  <  0.  Functions  in 
H2(n+)  are  completely  characterized  by  their  boundary  values  since  by  Theorem  3.1, 
the  boundary  value  function  F(iu>),  can  be  analytically  continued  to  recover  F  on  the 
right-half  complex  plane  via  the  Poisson  integral.  By  the  maximum  principle,  we  also 
know  that  ||F||#2  =  !|A||£,2  where  F  is  the  nontangential  limit  of  F. 


38 


Remarks:  In  what  follows,  we  take  the  liberty  of  identifying  H2(n+)  with  H^(1R), 
keeping  in  mind  that  element  of  the  latter  are  boundary  values  of  functions  in  H2(Et+). 
We  use  u  to  denote  the  real  frequency  variable  and  write  F(u)  to  denote  the  boundary 
value  function  F{iu>).  The  complex  frequency  variable  will  be  denoted  by  s  —  x  +  iy. 
We  also  adopt  the  following  notation  in  the  remainder  of  this  dissertation: 

RH2(n+)  :  Real-rational  functions  in  H2(II+),  that  is  rational  functions  in  H2(II+) 
with  real  coefficients. 

HR(n+)  :  Functions  in  H2(II+)  which  are  Laplace  transforms  of  real-valued  functions 
in  L2(0,  oo)  . 

Note  that  RH2(II+)  C  H^(n+). 

3.2  Wavelet  Transforms  on  H2(II+) 

When  H2(n+)  is  considered  as  a  closed  subspace  of  L2(1R)  ,  it  is  easy  to  see  how 
wavelet  theory  developed  for  L2(JR)  may  be  applied  to  H2(II+).  Let  $  G  H2(n+)  be 
an  admissible  analyzing  wavelet  for  H2(II+),  where  admissibility  is  now  defined  by 
applying  the  admissibility  condition  (2.2.8)  to  the  restriction  of  $  to  vertical  lines  in 
II+,i.e. 

|  -  2 

CV.  =  /  - — — - du  <  oo,  x  >  0.  (3.2.4) 

J  R  |«| 

As  in  the  case  of  L2(IR)  it  is  possible  to  define  a  continuous  wavelet  transform  on 
H2(n+)  with  respect  to  ^  as  follows.  Let, 

* {a'b\s)  =  (TibDa)V{s)  =  |a|1/2  ^(as  -  ib),  a,  b  G  IR,  a  >  0. 

Then  for  any  F  G  H2(II+)  define  the  continuous  wavelet  transform  on  the  line  !Re  s  —  x 

by, 

WxF(a,b)=(f,^)=  f  Fx{y)^'b\y)dy.  (3.2.5) 

j  IR. 

Inversion  of  this  transform  is  accomplished  by  the  inversion  formula, 
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To  define  a  discrete  wavelet  transform  with  respect  to  T,  let 


®m,n(-s)  =  a^24>(a™.s  -  inb0)  a0  >  0,  -Re  5  =  x. 

Assume  41  satisfies  the  admissibility  condition  and  let  ao  >  0  and  b0  be  such  that  the 
family  {41 m,n}m,neZ j  is  a  frame  for  V  which  is  a  closed  subspace  of  L2(IR)  .  Thus  for 
any  F  £  H2(II+)  we  have  that 

Fx(y)  =  £  £  (f,  5_1^-,n)  *m,„, 

m  n 

where  S  is  the  frame  operator  associated  with  the  frame 


3.3  Wavelet  System  Transfer  Functions 

Here  we  restrict  discussion  to  affine  frames  constructed  on  the  imaginary  axis.  Hence 
we  require  admissibility  of  the  analyzing  wavelet  4!($)  for  IRe  s  —  0  and  consider  the 
boundary  values  of  H2(n+). 

Let  denote  the  nontangential  limit  of  an  admissible  analyzing  wavelet  41(s) 
and  let  (a0,b0)  be  such  that  (9,a0,b0)  generates  an  affine  frame  for  H2(n+).  Then 
any  F  £  H2(n+)  can  be  represented  as 

F(U)  =  £  £  Cm,n^m,n(^  (3‘3-6) 

m  n 

where  as  before  $ m>n( w)  =  nb0),  and  one  set  of  suitable  coefficients  may 

be  computed  using  the  inverse  of  the  frame  operator  (see  (2.2.17)).  Assuming  that 
F  £  H2(n+)  is  the  Laplace  transform  of  a  real- valued  weighting  pattern  in  L2(0,  00)  , 
(i.e  F  £  H|l(n+)),  and  41  is  a  real-rational  analyzing  wavelet  (41  £  RH2(n+)),  we  can 
ask  the  question;  is  an  arbitrary  truncation  of  the  frame  expansion  (3.3.6)  the  transfer 
function  of  a  real  weighting  pattern?  Since  RH2(n+)  arises  as  the  image  under  the 
Laplace  transform  of  a  class  of  real-valued  weighting  patterns,  if  the  analyzing  wavelet 
$  £  RH2(n+)  then  4>m,„(-u;)  =  ^Jw)  for  n  =  0  and  m  £  Z.  However,  this 
symmetry  is  violated  for  n  ^  0.  Hence  arbitrary  truncations  of  the  series  (3.3.6),  will 
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not  in  general  result  in  real-rational  functions,  i.e. 

£  $  RH2(II+), 

n£j 

for  arbitrary  finite  index  sets  J .  This  observation  motivates  the  following  definition 
of  a  wavelet  system  transfer  function. 

Definition  3.2  Given  $  £  RH2(II+)  C  H2(II+)  a  wavelet  system  transfer  function 
is  defined  as  a  function  of  the  form, 

Gm,n(s)  =  aVmtn(s)  + 

where  a  denotes  the  complex  conjugate  of  a . 

Proposition  3.1  Let  F(s)  be  a  real-rational  function  in  H2(II+).  Then,  the  wavelet 
system  transfer  function  defined  by 

Gm'n(s)  =  aFm,„(s)  +  aV»(5)> 

is  also  a  real-rational  function  in  H2(II+). 

Proof:  Since  H2(II+)  is  a  closed  linear  space,  it  is  clear  that  Gm’n(s), £  H2(II+).  It 
is  also  clear  that  Gm,n(s )  is  rational.  To  show  that  Gm,n(s )  is  real-rational  we  only 
need  to  show  that 

Gm'n(- w)  =  Gm'n(u). 

Now, 

Gm'n(-u )  =  aFm,n(-uj)  +  aFm,-n(-u) 

=  aa™^2 F(—a™u>  -  nbo )  +  aa™t2F{— a™ui  +  nbo ) 

=  aa™^2  F(-(a™u)  +  nb0 ))  +  aa™^2 F(-(a™cj  -  nb0 )) 

=  aa^2f(a™u  +  nbo)  +  aa™^2 F(a™u>  —  nbo ) 

—  CxF m1— n(^)  T  0/.F 7n,7i(^) 

=  ■ 
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One  set  of  suitable  coefficients  in  the  wavelet  decomposition  (3.3.6),  is  given  by 
cm,n  =  (F,  The  utility  of  the  notion  of  wavelet  system  transfer  functions 

hinges  on  the  requirement  that 

Sn  =  (F,  =  (F,  S^'im-n)  =  c^.  (3.3.7) 

However,  in  general  there  is  no  reason  to  expect  these  coefficients  to  satisfy  this  kind 
of  symmetry  condition.  We  now  show  that  this  symmetry  is  satisfied  for  a  sufficiently 
interesting  class  of  functions  in  H2(n+),  namely  the  class  Hjl(n+). 

As  a  first  step,  in  verifying  condition  (3.3.7),  consider  the  following  proposition. 

Proposition  3.2  Let  f  and  g  be  boundary  values  of  functions  in  H2(n+).  Also  let  f 
and  g  be  such  that  /(  —  to)  =  f(u>),  and  g(—u)  =  g(aj).  then, 

(f  ■>  9m,n)  —  ( fi9m,—n )• 

Proof:  (See  Appendix  B) 


Remark:  Note  that  if  we  were  to  consider  only  tight  frames,  Proposition  3.2  is 
sufficient  to  guarantee  that  (3.3.7)  is  satisfied  for  all  F  6  H|l(n+),  since  in  the  case 
of  a  tight  frame  the  frame  operator  is  a  multiple  of  the  identity. 

The  following  lemma  states  that  for  general  affine  frames,  (3.3.7)  is  satisfied  for 
all  F  £  H^(n+). 

Lemma  3.1  Let  f  and  g  be  boundary  values  of  functions  in  H2(n+)  such  that 
f(—ui)  =  f{a>),  and  g(-u> )  =  g(w).  Let  ao  >  0  and  bo  be  such  that  ngZ 

is  a  frame  and  let  S  be  the  associated  frame  operator.  Then, 


Cm,n  —  (^f)S  9m, n ^  —  {f  >  S  ^ 9m,— n )  —  Cm,- 


Proof:  Since  the  frame  operator  S  is  self-adjoint, 


(f,S  1gm,n)  =  (s  1f,gm,n)- 
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Therefore  by  Proposition  3.2  we  need  only  check  that  if  /(-m)  =  f(u )  then 


S  1/(-w)  =  5_1/(w).  Now, 


A+B 


k= 0 


E 


O  \  k 

s 


A  +  B 


(3.3.8) 


Let 


Gm,n(u )  =  (f ,  gm,n)  gm,n(u)  +  (f,gm-n)gm  ,-n(u)  m  6  Z  ,  n  e  Z+ ,  n^O 

Gm,0(o;)  =  (/,flfm,o)fifm,o(w)  rue's.. 


Then 


(S/K-W)  =  2 

m  L 

=  E 


n=l 

oo 


Gm’°(w)  +  E  Gm’n(w) 

71=1 


=  ($/)(")• 


Thus, 


2 

A  +  B 


and  together  with  (3.3.8),  we  get  ( S  1  /)(— w)  =  (S-1/)(w)>  which  completes  the 
proof.  ■ 


As  a  step  in  formalizing  later  discussion,  let  us  define  a  transform  operator  which 
may  be  associated  with  the  grouping  suggested  by  the  definition  of  WS  transfer  func¬ 
tions.  We  refer  to  this  transform  as  a  wavelet  system  transform  on  II^(II+).  We  begin 
by  defining  the  space  which  contains  the  image  of  H|^(n+)  under  the  WS  transform 
operator. 

Definition  3.3  Define  7 Z(J)  as  the  space  of  sequences  of  functions  {Fk}keJ  such  that 
Fk  e  RH2(n+)  c  H2(n+)  for  all  k£j,  and 

II  E  Fk\\h  <  (3-3-9) 

kej 

where  J ,  is  a  countable  (index)  set. 
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Definition  3.4  (Wavelet  System  Transform)  Let  $  E  H2(n+)  be  an  admissible 
real-rational  analyzing  wavelet  and  ao  >  0,  b0  such  that  (\P,  a0,  i>o)  generates  an  affine 
frame  for  H2(II+).  Let  S  be  the  associated  frame  operator.  The  Wavelet  System 
transform  is  defined  by  the  operator  Hq  :  H|l(n+)  — *•  7l(7Z.  x  Z+), 

H*F  =  {F™nmeZ,  n6Z+, 

where  for  any  F  E  Hjp> (n+),  Fm,n  is  given  by, 

Fm,n(u)  =  + 

m  E  Z,nE  2Z+\{0}, 

Fm-V)  =  (F,S-^m,  0)fmio(w)  mGZ.  (3.3.10) 

The  operator  Hq  defines  an  invertible  isometry  from  H2(II+)  to  1Z{7L  x  Z+),  (where 
(3.3.9)  defines  the  norm  on 

Remark:  A  second  definition  could  be  used  for  the  space  containing  Ran  Hy.  We 
could  define  7£(J)  as  in  Definition  3.3,  with  (3.3.9)  replaced  by  the  requirement  that 

Y  \\Fk\\h  <  °°- 

k£j 

This  is  possible  using  the  knowledge  that  {4,„l!n}  is  a  frame  and  that  ||4,m,n||  =  C 
for  all  (m,  n)  E  Z2  and  some  finite  constant  C.  To  explicitly  see  how  this  works,  let 
F  E  H^(n+),  (thus  (3.3.7)  is  satisfied),  and  let  A  and  B  be  the  upper  and  lower  frame 
bounds  associated  with  the  frame  Thus, 

Yj  ||Tm’”||2  =  ||cmi0^m,o||2  +  "Ys  m,n\i 

meZ,neZ+  meZ  ngZ+\{o} 

<  Yj  ||cm,0^m,o|!2  +  Yj  (||cm,n^,m,7i||  +  ]|cro,-n  ^m:n  || ) 

meZ  „ez+  \{o} 

<  C2  Y,  lcm,o|2  +  Yj  (lcm,n|  +  |cm,-n|)' 

neZ+  \{0} 
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<  C2  |cm,o|2  +  4  lc™>«|2 

TOgZ  neZ+\{0} 

5:  C2  2|cm)o[2+4  ^  |cm,n|2 

">eZ  ne  Z+  \{o} 

=  2C2  £  |cm,n|2  <  2C2B||F||2  <  oo 
m,n€  Z 

However,  in  this  case  would  no  longer  be  an  isometry. 

Inversion  of  Hq  involves  a  simple  summation  of  terms  in  the  sequence  {Fm,n},  and 
together  with  Proposition  3.1  and  Lemma  3.1,  gives  us  the  following  decomposition 
theorem  for  H|l(n+). 

Theorem  3.3  (Wavelet  System  Decomposition  of  Hjpl(n+))  Let  ^  G  RH2(n+) 
be  an  admissible  analyzing  wavelet,  and  let  Oo  >  0,  bo  be  such  that  (’H,  ao,  bo)  generates 
an  affine  frame  for  H2(n+).  Then,  any  F  in  H|l(n+)  may  be  represented  as, 

oo 

F  =  (3.3.11) 

TO  n=0 

where,  each  Fm,n  (G  RH2(n+)/)  is  a  wavelet  system  transfer  function  defined  by, 

pm,n  =  (V,  S-l9m,n)  *m,n  +  {F,  S~'Vm, »}*  m,-n,  ™  G  Z,  »  G  Z+  \{0} 
Fm’°  =  (f, S-'Vmfi)  Vm,o  mGZ.  (3.3.12) 


Remark:  Of  subsequent  significance  is  the  fact  that  if  the  analyzing  wavelet  is  of 
degree  N,  then  the  degree  of  each  wavelet  system  transfer  function  Fm,n,  is  bounded 
by  2 N. 


3.3.1  Examples:  Rational  Analyzing  Wavelets  for  H2(II+) 


Example  1 

As  an  example  of  a  rational  analyzing  wavelet  for  H2(n+),  consider  the  function 

1 


*(«)  = 


(s  +  q)2  +  £2  ! 


>  0. 
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It  is  easily  verified  that  $(s)  is  in  H2(II+)  and  furthermore  $  is  an  admissible  analyzing 
wavelet  i.e., 


for  *  >  0. 


Taking  the  inverse  Laplace  transform  of  T,  the  weighting  pattern  corresponding 
to  $  is  given  by, 


V>(*)  =  < 


£  1e  7tsin (£f) 

0 


for  t  >  0 
for  t  <  0 


The  H2(n+)  norm  of  can  also  be  found  to  be, 

Figures  3.1  and  3.2  show  this  analyzing  wavelet  4/,  evaluated  on  the  imaginary  axis, 
and  the  corresponding  weighting  pattern  tp  for  different  values  of  £  and  7. 


Constructing  an  Affine  Frame  for  H2(II+)  from  4>:  For  the  purpose  of  numer¬ 
ically  determining  values  of  the  dilation  stepsize  ao  and  translation  stepsize  &o  such 
that  (^,ao,6o)  generates  an  affine  frame  for  H2(II+)  we  can  utilize  Theorem  2.6  and 
Corollary  2.1.  Figures  3. 3-3.4  show  the  results  of  applying  Theorem  2.6  and  Corollary 
2.1  for  the  case  where  ao  =  2.0.  Hence  for  a0  =  2.0  and  0  <  bo  <  17  the  sequence 
{'f  m,n}  forms  an  affine  frame  for  H2(II+). 


Example  2 


For  a  second  family  of  rational  analyzing  wavelets  for  H2(n+),  consider  the  functions, 
^k(s)  =  7— Tp  P>  0;  k  =  2,3,.... 

The  corresponding  weighting  patterns  are, 

**<*>  = 

Admissibihty  is  easily  verified  since, 

1:^  ■  yhn'p 

=  (p4rj!)2(2A:~3)!(2ar2m+2<00- 
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Figure  3.3:  Estimates  of  frame  bounds,  using  H2(II+)  analyzing  wavelet  $  of  Example 
1,  with  7  =  5.0,  £  =  1.0  and  ao  =  2,  as  translation  stepsize  bo  is  varied.  Solid  curve: 
lower  frame  bound  A;  Dashed  curve:upper  frame  bound  B. 


Figure  3.4:  Ratio  ( B/A )  of  estimated  frame  bounds,  using  H2(II+)  analyzing  wavelet 
of  Example  1,  with  7  =  5.0,  £  =  1.0  and  ao  =  2,  as  translation  stepsize  bo  is  varied. 
Solid  curve:  B/A ;  Dashed  line:  constant=1.0. 
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Figure  3.5  shows  this  wavelet  and  the  corresponding  weighting  pattern  for  p  =  5.0,  k  = 
2.0.  As  in  the  last  example  the  frame  parameters  can  be  numerically  determined  with 


Figure  3.5:  H2(II+)  analyzing  wavelet  $  for  p  =  5.0,  k  =  2  and  corresponding  weight¬ 
ing  pattern 

the  help  of  Theorem  2.6.  Figures  3. 6-3. 7  show  the  results  of  this  exercise  for  the  case 
where  do  =  2.0  and  k  =  2,  and  p  =  5.0.  Hence  for  do  =  2.0  and  0  <  bo  <  17  the 
sequence  {*Fm,n}  forms  an  affine  frame  for  H2(n+). 
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30 


Translation  Stepsize  b 

Figure  3.6:  Estimates  of  frame  bounds,  using  H2(II+)  analyzing  wavelet  of  Example 
2,  with  p  =  5.0,  k  =  2.0  and  a0  =  2,  as  translation  stepsize  bo  is  varied.  Solid  curve: 
lower  frame  bound  A ;  Dashed  curve:upper  frame  bound  B. 


Figure  3.7:  Ratio  ( B/A )  of  estimated  frame  bounds,  using  H2(II+)  analyzing  wavelet 
^  of  Example  2,  with  p  =  5.0,  k  ~  2  and  do  =  2,  as  translation  stepsize  b0  is  varied. 
Solid  curve:  B/A ;  Dashed  line:  constant=1.0. 
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Aside  from  the  examples  of  the  last  section  we  can  ask  for  a  characterization  of  all 
admissible  rational  H2(II+)  analyzing  wavelets.  The  following  theorem  provides  such 
a  characterization. 

Theorem  3.4  Let  ^  £  RH2(n+);  and  let  (A,B,C)  be  a  state  space  realization  of  $ . 
Then  the  following  are  equivalent: 

(1)  'F  is  an  admissible  H2(II+)  analyzing  wavelet. 

(2)  s’F(.s)  — *  0,  as  s  — >  oo. 

(3)  CB  =  0. 


Proof:  Recall  that, 


dnf\  1  rio° 

±1))  =  ~  /  snF(s)ds. 

dtn  )  t-o  27T ij-ioo 


Thus  if  /(f),  has  a  zero  of  order  k  +  1,  at  t  =  0,  i.e. 


/too 

snF(s)ds  —  0,  to  =  0,1,...,  A. 

-too 


(3.3.13) 


This  requires  that  F(s)  goes  to  zero  faster  than  1  /sk+1,  as  s  — *•  oo.  For  rational  F(s), 
this  means, 

F(S )  -  for  5  -  °°- 

Conversely  for  F(s)  with  this  asymptotic  behavior,  we  can  deduce  that  f(t)  has  a  zero 
of  order  k  +  1,  at  t  =  0,  (c.f.  [Gui63]).  Therefore  sF(s )  — >  0,  as  s  — *■  oo,  if  and  only 
if  f(t)  has  a  first  order  zero  at  t  =  0.  This  gives  |/(t)|2,  a  second  order  zero  at  t  =  0, 
and  therefore  admissibility  of  F. 

Thus  sty(s)  ->  0,  for  s  -»  oo,  if  and  only  if  ’F,  is  admissible.  Equivalence  of 
sF(s)  ->  0,  to  CB  =  0,  is  easily  seen  from  the  Laurent  series  expansion  of  F\ 


F{s)  =  £ 
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where  the  Markov  parameters  h, t  are  given  by, 


hk  =  CAk~lB.  ■ 


3.4  Properties  of  Wavelet  System  Decompositions 

3.4.1  Time-Frequency  Localization 

As  noted  earlier,  time-frequency  localization  is  perhaps  the  most  beneficial  property 
of  affine  wavelet  representations.  Wavelet  system  transfer  functions  inherit  the  time- 
frequency  localization  properties  of  the  underlying  (rational)  wavelets.  As  in  Section 
2.2.5,  let  ff(f )  =  [n>o(f  ),u>i(f )]  denote  the  frequency  concentration  of  $  and  12(f)  = 
[fo(f),  ti(f )]  (to  >  0)  denote  the  time  concentration  of  f  (computed  via  the  inverse 
Laplace  transform).  Then  $  is  a  function  which  is  concentrated  in  the  time  frequency 
plane  on  the  rectangle  Q  =  0(f)  x  12(f)  and  each  of  the  WS  transfer  functions  are 
concentrated  on  rectangles 

Qmn  —  [aom(Wo(f)  +  n6o),a0m(a;1(f)  +  n6o)]  X  K%(f ),  amtr(f )] 

=  0( fm,„)  X  12(fm,„).  (3.4.14) 

Figure  3.8,  shows  the  distribution  of  the  rectangles  Qmn  in  the  time-frequency  plane. 
Since  the  wavelets  for  H2(II+)  are  constructed  in  the  frequency  domain,  the  time 
and  frequency  axes  are  interchanged  when  compared  with  the  corresponding  picture 
for  wavelets  constructed  in  the  time-domain,  i.e.  it  is  now  the  time  axis  which  is 
treated  logarithmically.  In  some  sense,  this  interchange  of  the  localization  in  time  and 
frequency  is  more  natural  for  the  decomposition  of  linear  systems  than  the  localization 
arising  from  wavelets  constructed  in  the  time-domain.  To  see  this,  note  that: 

•  Near  t  —  0,  while  time  localization  is  good,  the  frequency  concentration  of  each 
wavelet  system  encompasses  a  large  band  of  frequencies. 


52 


•  For  t  »  0,  frequency  localization  is  very  good  so  that  it  is  possible  to  ‘zoom’  in 
on  narrow  frequency  bands,  whereas  localization  in  time  is  poor. 

Often,  in  the  case  of  physical  systems,  high-frequency  components  of  the  impulse  re¬ 
sponse  are  localized  near  the  time-origin  (c.f.  Figure  4.4).  Furthermore  away  from  the 
origin,  the  impulse  response  decays  ‘smoothly’.  The  form  of  time-frequency  concen¬ 
tration  arising  in  WS  transfer  functions  allows  us  to  capture  this  behavior  efficiently, 
since  relatively  few  terms  are  required  near  the  time-origin  to  capture  a  broad  range 
of  frequency  components.  Moreover,  away  from  the  time-origin,  the  narrower  localiza¬ 
tion  in  frequency  allows  us  for  example  to  represent  a  narrow  band  of  low  frequencies. 

3.4.2  Poles  and  Zeros 

Often  a  great  deal  of  insight  into  the  behavior  of  a  system  may  be  derived  from  a 
description  of  the  poles  and  zeros.  Given  the  poles  and  zeros  of  the  analyzing  wavelet 
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'P,  we  may  examine  the  manner  in  which  the  poles  and  zeros  of  the  WS  transfer 
functions  are  influenced  by  the  translations  and  dilations.  Assume  first  of  all  that 
has  no  real  axis  poles  and  that  ^  is  a  degree  N  real-rational  function  in  H2(II+).  Let 
{Pk,Pk}kl i  be  the  set  set  of  poles  of  'F(s),  and  {zj,Zj}j=1  be  the  set  of  zeros  of 
Since  is  real-rational,  'F  can  be  written  as 


=  Hf)  =  Qj(f  -  Zj){3  -  Zj) 

K)  Q(s)  uk(s-Pk)(s-pky 

where,  P(s)  and  Q(s )  are  coprime.  Thus, 

q,  (s)  _  a™/2  n  j(aos  ~  zj  ~  inbo)(a%s  -zj-  inb0 ) 

-  m’n  S  Y\k{a^ s  -  pk  ~  inb0){d^ s  -  fk  -  inbo) 

a^/2  El M  ~  ra))(g  ~  7j(™, »))  _  Pm,n(s )  (3415) 

11a £s  -  T)k(m,n))(s-  i/k(m,n))  Qm,n{s)' 

where, 


/?A,(m,n)  =  a0m  (zj  +  inb0)  7  k{m,n)  =  a0m(zj  +  inb0)  (3.4.16) 

rjk(m,n)  =  a^m  {jpk  +  inb0)  vk(m,  n)  =  apm(pL+  inb0).  (3.4.17) 


Note  that  rjk(m,  — n)  =  Vk(m,  n )  and  uk(m,  — n)  =  rfifim,  n)  and  /3*.(m,  -n)  =  7 £(to,  n) 
and  7 fc(m,  — n)  =  pk(m,n).  Therefore  the  poles  of  Gm'n(s)  =  a’f,min(s)  +  a^m)_u(s) 
are 

{T/fc(m,n),^(m,n),i/A:(m,n),7T(m,n)}.  (3.4.18) 

Thus  the  effect  of  dilations  and  translations  upon  poles  is, 

•  Dilations  move  the  poles  of  Gm,n  radially  away  from  and  towards  zero.  As  the 
dilation  index  m  increases,  the  poles  move  towards  zero  and  as  m  decreases 
the  poles  move  away  from  zero.  However,  the  poles  remain  in  the  closed  left 
half-plane  which  is  crucial  since  otherwise  the  Gm,n  would  not  remain  in  H2(n+). 

•  The  complex  translations  simply  translate  the  poles  along  vertical  fines  in  the 
left  half-plane. 

Figure  3.9  illustrates  the  behavior  of  the  poles  of  Gm,n. 
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Figure  3.9:  Behavior  of  poles  of  WS  transfer  functions 
If  we  write  Gm,n(s)  =  JVmtn(s)/Dm>n(s),  then  the  zeros  of  Gm,n(s )  are  the  roots  of 

^7 n,n(^)  ^■frra,n('S)TlTOl_n(s)  -j-  OtP m,— n(s)-Dm,n(&) 

=  ®  II^  “  ra))(s  “  7 j(m,  n))  -  %(mj  n))(s  -  Pfc(m,  «)) 

3  k 

+«II(S  ~  /3j(7rt’  7i))(s  -  T7(m7n))II(s  ~  Vk(m,n))(s  -  vk(m,n)). 

3  k 

Without  actually  deriving  expressions  for  the  zeros  of  Gm,n(s )  we  make  the  following 
observations: 


•  The  zeros  of  Gm,n(s )  occur  in  complex  conjugate  pairs. 

•  For  n  ^  0,  if  Gm,n  has  no  real-axis  poles,  then  Nm>n(s)  and  Dm,n  are  coprime, 
and  therefore  Gm,n(s)  is  a  strictly  proper  rational  function  of  order  2 TV. 

•  For  n  =  0  pole-zero  cancellation  results  in  Gm,n(s )  of  order  N . 

•  If  l(n)  is  the  number  of  poles  of  Gm,n  on  the  real  axis  then  the  order  of  Gm,n(s) 
is  2 N  -  l(n). 
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3.4.3  State  Space  Realizations  of  Wavelet  System  Transfer  Functions 

Let  ?/>(£)  be  the  weighting  pattern  (i.e.  inverse  Laplace  transform)  corresponding  to 
the  real  rational  H2(II+)- wavelet  $  and  define  the  triplet  ( A,B,C )  to  be  a  minimal 
realization  of  In  this  section  we  examine  the  question  of  realizations  of  the  WS 
transfer  function  Gm'n(s )  =  aWm>„(s)+a4'm  _„(s).  Let  aR  and  a7  denote  respectively 
the  real  and  imaginary  parts  of  a.  Since  (A,  B,  C)  is  a  realization  of 

4>{t)  =  C  exp  (At)B. 

Taking  the  inverse  Laplace  transform,  the  weighting  pattern  corresponding  to  Gm,n(s ) 
is, 

gm,n{t)  =  aa-ml2i,{a^mt)einaomh^  + 

-  %m^2'ip(aQmt)2  aRcos(na,Qmb0t)  -  aism(na,Qmb0t)  (3.4.19) 

=  aa~m/2C  exp(ag  mTf)Be”lao"m&ot  +  aa~m/2C  exp(<Zo  ! ™  At)Be~ina°mhot 
=  atZg  ml2C  exp((ag m4.  +  inaQmb0I)t)B 

+aa,Qm/2C exp((ag”M  -  iciQmnb0I)t)B.  (3.4.20) 

Equation  (3.4.20)  immediately  provides  a  complex  realization  of  Gm,n(s )  which  is 


a0mA  +  ina0  mboI  0 

(3.4.21) 

Amn  — 

0  aQ m  A  —  ina.QmboI 

Bmn  ~ 

[B  B]T 

(3.4.22) 

c  - 

v-'mn  — 

—m/2  —m/2 — ^ 

a0  '  aC  aQ  '  aC  . 

(3.4.23) 

However,  since  V’m.n(i)  is  a  real  weighting  pattern,  we  need  a  change  of  basis  to  give 
us  a  real  realization.  The  differential  equations  corresponding  to  (3.4.23)  are 

x\  =  ( agmA  +  inaQmboI)x\  +  Bu 

x2  =  (a,Q m  A  —  inaQmboI)x2  +  Bu  (3.4.24) 

and  the  output  map  is  given  by 

y  —  aCx\  +  aQ1TLt2aCx2-  (3.4.25) 
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Let,  zi  =  {x\  +  x2)  and  z2  =  i(x i  -  x2).  Then  we  get 


Z\  —  q,q  m A.Z\  T  na0mboz2  -)-  2 Bu 

z2  —  %m Az2  -  naQmb0Z\.  (3.4.26) 

Under  this  transformation  the  output  map  becomes 

y  =  aQm/2  (aRC(x1  +  x2)  +  iaIC(xi- x2)) 

=  a,Q  m^2  ( ctRCzi  +  iaiCz2) .  (3.4.27) 


From  (3.4.26)  and  (3.4.26)  we  get  the  following  real  realization  of  Gm,n(s), 


aomA 


naomb0I 

— m 


n  ±  0  (3.4.28) 


Br, 


C„ 


(•‘d-m, 0?  Bm<o,  Cmfi')  — 


—na0mboI  a  0mA 

[2 B  0]T,  0 

aom/,2a«C  a0m/,2a;C]  ,  n  /  0 
(aomA,  B,  tig  m/2aC) 


(3.4.29) 

(3.4.30) 

(3.4.31) 


It  is  interesting  to  note  that,  in  this  form  (3.4.28-3.4.30),  the  dilations  and  translations 
which  appear  in  the  WS  transfer  function  Gm<n(s)  affect  the  state  space  realization 
via  dilations  and  translations  as  well,  i.e. 


A 


m,n 


—  &  n 


A  0 
0  A 


n  ^  0. 


3.4.4  Parallel  Connections  of  WS  Transfer  Functions 


In  constructing  rational  approximations  (see  Section  3.5),  we  will  be  considering  par¬ 
allel  connections  of  WS  transfer  functions  of  the  form, 


Gj(s)=  £  (3.4.32) 

(m,n)£j 
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where  J  is  a,  finite  index  set.  Given  Equations  (3.4.28-3.4.30)  defining  the  realization 
{Am,n,  Bm,n,Cmjn)  of  Gm,n(s),  a  state  space  realization  of  Gj(s)  is  readily  obtained. 


Aj  = 


A  i 

"77ll,7lJ 


A  1 

7711  >7^2 


0 


0 


m2  ,n 


2 

1 


1 


0 


A 


m  i 


,71 


2 

2 


0 


0 


0 


Bj 

Cj 


,7ij  >  Bm j  >nl  >  ’  '  ’ 


(3.4.33) 


(3.4.34) 

(3.4.35) 


3.4.5  Minimality  of  State  Space  Realizations 


An  obvious  question  that  can  be  asked  regarding  the  realization  (3.4.33-3.4.35)  of 
Gj{s)  is  whether  such  a  realization  is  minimal  in  the  sense  of  being  both  controllable 
and  observable.  We  have  the  following  result, 


Theorem  3.5  Let  ^  6  H2(II+)  be  an  admissible  rational  analyzing  wavelet,  let  {pj} 
denote  the  poles  of  and  let  ( A,B,C )  be  a  minimal  realization  of  4'  (of  dimension 
N ).  Also  let  J  be  a  finite,  bounded  index  set.  Then 


(a)  The  realization  (3. 4-28-3.  4-30)  of  Gm,n(s)  is  minimal  for  all  ( m.n )  6 
J  if  and  only  if,  for  each  j  =  1  ,...,N,  there  does  not  exist  a  nonzero 
integer  k,  such  that. 


Pj  _  , 
bo 

(b)  The  realization  (3. 4- 33-3. 4 .35)  ofGj(s)  is  minimal  if, 


(3.4.36) 


(1)  (Am^n,  Bm_n,Cm^n)  as  defined  in  (3.4-28-3.4.30)  is  a  minimal  realiza¬ 
tion  of  Gm,n(s)  for  all  (m,  n)  E  J,  and 
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(2)  For  each  pair  ( j,l )  G  {1, . . . ,  N}  x  {1, . . N},  there  does  not  exist  a 
nonzero  integer  k,  such  that. 


1  ,  f-$te  pj\ 

loga0  °°  \  — 9£e  pi  J 


=  k. 


(3.4.37) 


Proof:  (See  Appendix  B.l)  ■. 


Remarks:  The  proof  of  the  above  theorem  relies  on  there  being  no  cancellations  of 
poles  by  zeros  in  the  required  combinations  of  translates  and  dilates  of  the  analyzing 
wavelet.  Note  also  that  (3.4.36)  implies  that  the  poles  of  any  nonzero  translate  ( n  =£  0) 
of  the  analyzing  wavelet  remain  away  from  the  real- axis. 


3.4.6  Example:  Wavelet  System  Decomposition 
Example  1:  Heat  Equation  with  Dirichlet  Boundary  Control 


Consider  a  system  defined  by  the  partial  differential  equation, 

If  =  6  5  *(0)  =  2(!)  = 

y(t)  =  z(x0,t) 

This  system  has  transfer  function  [Cur88] 


(3.4.38) 


sinh  y/sxo 

Gl(s)  =  litwT' 

By  assuming  lowpass  characteristics  of  the  sensor  where  the  measurement  y,  is  made, 
we  can  write  the  overall  transfer  function  as  , 


G(s) 


1  sinh  y/sx o 
s  +  7r  sinh  y/s 


(3.4.39) 


A  decomposition  of  G(s)  (with  xo  =  0.5)  was  computed  using  12  dilation  levels  (to  = 
—5,..., 6)  and  up  to  33  translations  at  each  dilation  level.  The  analyzing  wavelet 
used  is  that  of  Example  1,  in  Section  3.3.1,  with  7  =  5  and  £  =  1.  The  results  of  this 
decomposition  are  shown  in  Figures  3.10-3.12  which  are  different  representations  of 
the  magnitude  of  the  wavelet  system  expansion  coefficients.  Along  the  dilation  axis. 
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zero  corresponds  to  the  lowest  dilation  level  (m  =  -5),  and  along  the  translation  axis 
zero  corresponds  to  n  =  0.  Figures  3.10  and  3.11  are  3D  and  contour  plots  respectively 
of  the  coefficient  magnitudes  and  Figure  3.12  is  a  density  plot  in  which  each  rectangle 
is  shaded  according  to  the  corresponding  coefficient  magnitude.  The  key  feature  in 


Figure  3.10:  Wavelet  system  decomposition  of  heat  equation  transfer  function  -  3D 
plot  of  magnitude  of  expansion  coefficients 

this  decomposition,  which  is  probably  most  obvious  in  the  density  plot  (Figure  3.12) 
is  that  the  magnitudes  of  the  coefficients  are  very  well  concentrated.  This  feature, 
which  is  due  to  the  time-frequency  localization  properties  of  affine  wavelets,  permits 
us  to  pick  a  finite  number  of  ‘significant’  terms  in  the  wavelet  system  decomposition. 
In  doing  so,  a  finite-dimensional  approximation  to  the  original  transfer  function  is 
obtained.  Using  the  results  of  Section  3.4.3,  it  is  also  possible  to  immediately  write 
down  a  minimal  state  space  realization  of  the  approximating  system. 
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Translations 

Figure  3.11:  Wavelet  system  decomposition  of  heat  equation  transfer  function  -  con¬ 
tour  plot  of  magnitude  of  expansion  coefficients 

3.5  Rational  Wavelet  System  Approximations 

Wavelet  system  decompositions  as  in  (3.3.11)  provide  a  means  of  representing  non- 
rational  transfer  functions  in  H2(II+)  as  infinite  sums  of  rational  transfer  functions1. 
As  mentioned  earlier,  our  primary  objective  in  deriving  such  a  decomposition  is  to 
devise  a  systematic  method  of  constructing  rational  approximations.  What  is  required 
now  is  a  mechanism  which  allows  for  judicious  selection  of  a  finite  number  of  terms 
from  the  expansion  in  (3.3.11)  which  will  allow  useful  approximation  of  a  nonrational 
transfer  function.  For  this  purpose  we  utilize  the  localization  properties  afforded  us 
’Note  that  causality  is  preserved  due  to  the  property  described  by  Equation  (3.1.3) 
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Translations 


Figure  3.12:  Wavelet  system  decomposition  of  heat  equation  transfer  function  -  density 
plot  of  magnitude  of  expansion  coefficients 


by  affine  wavelets. 

For  a  significant  class  of  transfer  functions  arising  from  physical  systems,  the  WS 
decomposition  will  result  in  a  reasonably  compact  representation  in  the  sense  of  well 
localized  and  rapidly  decaying  coefficients.  This  phenomenon  can  be  explained  on  the 
basis  of  the  time-frequency  localization  properties  of  the  WS  transfer  functions  and 
the  observation  that  H2(II+)  transfer  functions  arising  from  physical  systems  are  often 
well  localized  in  time-frequency  as  well.  It  is  possible  to  devise  a  variety  of  schemes, 
each  relying  on  time-frequency  localization  properties,  for  the  selection  of  terms  in  a 
WS  decomposition  for  use  in  a  finite-dimensional  approximation. 

One  possible  choice  for  the  selection  of  a  finite-dimensional  WS  approximation  can 
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be  based  upon  knowledge  of  the  time-frequency  concentration  of  the  (nonrational) 
transfer  function  which  is  to  be  approximated.  For  example  given  that  the  time- 
frequency  concentration  of  a  transfer  function  /  £  H2(II+)  is  Q(f)  we  can  select  a 
subset  of  the  wavelet  system  transfer  functions  based  upon  the  size  of  Q(/)fl2m,n) 
where  Qm,n  is  the  concentration  of  the  WS  transfer  function  Gm,n(s).  Daubechies  in 
[Dau90]  provides  bounds  for  the  error  of  such  approximations  in  terms  of  the  energy  of 
/  outside  Q(/).  A  second  selection  scheme  could  consist  of  first  computing  the  wavelet 
expansion  coefficients  for  a  large  number  of  terms  and  then  simply  discarding  those 
terms  with  ‘small’  coefficients.  This  second  method  is  similar  to  those  used  to  achieve 
signal  or  image  compression  via  wavelet  decompositions  (c.f.  [CMQW90,  Wic89]). 
One  such  selection  criterion,  can  be  based  upon  the  (?  norm  of  the  coefficients.  Assume 
that  the  WS  decomposition  of  a  transfer  function  G  has  been  computed,  and  let 
«m,n  =  (G,  5-1'I'm>n)  denote  the  coefficients.  We  choose  the  K  largest  coefficients, 
(whose  indices  we  store  in  an  index  set  J),  where  K  —  #(J)  is  the  smallest  integer 
such  that, 


X/(m,n)gj7’  I^TO.n  I  ( ,  ^ 

— =;  —  j2~  —  l1  ~  °)> 

Z^(?n,n)  lam,n| 

where  0  <  6  <  1  is  some  predetermined  tolerance.  We  now  define  a  rational  approx¬ 
imation,  Gj,  to  G  as,  Gj(s)  =  E(m,n)ej^m’’l(s)'  Note  that  the  results  of  Section 
3.4.3  immediately  provide  a  minimal  state  space  realization  of  Gj. 


3.5.1  Approximation  Error  Bounds 

The  following  lemma  provides  a  bound  on  the  approximation  error  using  the  scheme 
just  described. 

Lemma  3.2  Let  {xn}n  £  7L  be  a  frame  for  a  Hilbert  space  7i,  with  frame  bounds  A 
and  B.  Assume  ||£n||  =  1,  for  all  n  £  7L.  For  any  f  £  'H,  define  an  approximation  f 
to  f  by, 

f=Yl  (f’S~lxn)xn, 
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where  J  is  an  index  set,  chosen  to  satisfy 

n€J  ngZ 

for  0  <  6  <  1 .  Then, 

Proof:  By  the  frame  condition, 

fl-'ll/-/!l2<  E  iA-'Wf-ftf. 

fcgZ 

Since,  /  -  /  =  E  n£j  (/> S~1xn)  xn,  we  have  two  coefficients  sequences  representing 
the  expansion  of  f  —  f  with  respect  to  the  frame  {r„}.  Therefore  by  Theorem  2.5, 

£|(/,5-u.}|2>  e|(/ 

n£J  fcgZ 

Therefore,  > 

B-'u-ff  <  Ei(/-/.^>r 

AigZ 

<  |(/»-S'“1®n)  2 

n$J 

<  sJ2\(f’s~lxk)\2 

fcgZ 

<  SA~l\\f\\2 

from  which  the  result  follows.  ■ 

Remark:  The  error  bound  in  Lemma  3.2  is  established  in  the  general  setting  of 
frames  in  Hilbert  spaces.  When  applied  to  the  specific  case  of  affine  wavelet  frames,  the 
bound  may  prove  to  be  quite  conservative.  This  is  because  time-frequency  localization 
properties  of  affine  wavelets  are  not  exploited  in  the  lemma. 

3.5.2  Example:  Rational  WS  Approximation 

As  an  example,  consider  the  WS  decomposition  of  the  heat  equation  transfer  function 
in  Section  3.4.6.  Letting  6  =  0.4  the  above  described  scheme  results  in  the  selection 
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of  7  terms,  with  a  corresponding  normalized  L 2  approximation  error  of  0.109,  and  an 
approximating  system  of  dimension  22. 

3.6  Some  Motivation  and  Remarks  in  Retrospect 

A  point  should  be  made  regarding  our  motivation  for  constructing  affine  frames  in 
the  frequency  (Laplace)  domain  as  opposed  to  directly  decomposing  weighting  pat¬ 
terns  via  affine  wavelets  in  the  time  domain.  First  of  all,  our  approach  naturally 
preserves  causality  of  the  approximating  system  since  each  term  in  the  WS  expansion 
is  in  H2(n+)  and  therefore  corresponds  to  the  transfer  function  of  a  causal  system. 
If  one  were  to  use  affine  wavelets  in  the  time  domain,  special  ‘tricks’  need  to  be  ap¬ 
plied  to  preserve  causality  since  (even  in  the  case  of  compactly  supported  wavelets), 
translations  would  eventually  result  in  a  noncausal  wavelet.  Secondly,  there  would 
be  no  mechanism  for  retaining  rationality  of  the  Laplace  transforms  of  the  individual 
wavelets,  since  translations  (delays)  prevent  this. 

Orthonormal  Wavelet  Bases  for  H2  ? 

In  the  beginning  of  this  chapter,  it  was  mentioned  that  there  are  certain  difficulties 
associated  with  constructing  ‘well-behaved’  orthonormal  wavelet  bases  for  H2(n+).  To 
fully  clarify  this  statement  would  require  some  description  of  the  recently  formalized 
subject  of  multiresolution  analyses  [Mal89cj.  We  forgo  such  a  description  and  refer  to 
[Dau88a]  for  a  thorough  treatment  of  the  subject. 

It  is  well-known  that  associated  with  every  multiresolution  analysis  of  L2(IR)  (or 
H2),  there  exists  an  orthonormal  wavelet  basis.  In  fact,  to  date,  all  known  examples  of 
orthonormal  wavelet  bases  can  be  associated  with  multiresolution  analyses.  In  the  case 
of  L2,  it  is  possible,  via  multiresolution  analyses,  to  construct  orthonormal  wavelet 
bases  which  are  very  well-behaved  in  the  sense  of  the  wavelets  being  arbitrarily  smooth 
and  being  well-localized  in  time-frequency.  However,  if  one  considers  the  space  H2(n+) 
instead,  the  situation  is  quite  different.  In  fact  it  is  not  even  possible  to  construct  an 
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orthonormal  basis  of  wavelets  in  H2(II+)'  with  continuous  and  well  localized  Fourier 
transform  via  multiresolution  analyses.  More  precisely,  there  is  the  following  theorem 
of  Jaffard  [Jaf89,  JL92]. 


Theorem  3.6  There  exists  no  orthonormal  basis  of  wavelets  for  H2(II+)  generated 
via  the  framework  of  multiresolution  analyses,  with  an  analyzing  wavelet  ip  such  that 
ip  is  continuous  and, 


ip(u)\  <  C\l>\~ 


for  any  a  >  1/2. 


In  particular  this  rules  out  smooth  and  well-localized  orthonormal  bases  of  rational 
wavelets  for  H2(II+).  Although  it  has  not  been  demonstrated  that  multiresolution 
analyses  are  the  only  means  of  constructing  wavelet  orthonormal  bases,  it  remains  an 
open  problem  to  construct  an  orthonormal  wavelet  basis  (even  for  L2(IR)  )  which  does 
not  arise  in  this  way.  The  above  theorem  suggests  that  if  one  wishes  to  use  rational 
analyzing  wavelets,  it  is  perhaps  necessary  to  consider  the  more  general  setting  of 
frames  (including  Riesz  bases). 


3.7  Summary 

In  this  chapter  we  have  introduced  a  new  decomposition  (which  we  refer  to  as  a 
wavelet  system  (WS)  decomposition )  of  the  Hardy  space  H2(n+),  and  examined  some 
properties  of  this  decomposition. 

The  main  result  of  this  chapter  is  in  Theorem  3.3  which  states  that  functions  in 
H^(n+)  may  be  represented  as  infinite  sums  of  time-frequency  localized,  real-rational 
functions  of  bounded  degree.  Construction  of  the  WS  representation  is  based  on  an 
appropriate  grouping  of  terms  in  an  affine  wavelet  frame  decomposition  of  H2(n+), 
where  the  analyzing  wavelet  is  real-rational.  For  a  real-rational  analyzing  wavelet  of 
degree  N  the  terms  in  a  WS  expansion  are  either  of  degree  N  or  2 N.  From  a  systems 
viewpoint,  WS  decompositions  are  representations  of  causal  LTI  systems  as  infinite 
parallel  connections  of  causal,  time-frequency  localized,  finite-dimensional  systems 
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of  dimension  bounded  by  2 N.  Theorem  3.4  gives  a  characterization  of  all  rational 
analyzing  wavelets  for  H2(II+),  and  demonstrates  the  large  amount  of  freedom  offered 
by  frame  theory  for  the  selection  of  ‘basis’  functions. 

WS  decomposition  naturally  leads  to  methods  for  constructing  rational  approx¬ 
imations  to  nonrational  transfer  functions  in  H2(II+).  Criteria  for  selecting  a  finite 
number  of  terms  from  the  infinite  wavelet  system  expansion  rely  on  compactness 
of  representation  which  arises  due  to  time-frequency  localization  properties  of  the 
wavelet  systems.  Coarse  bounds  on  the  L2  approximation  error  were  established,  for 
one  particular  selection  criterion.  It  was  also  shown  that  state-space  realizations  of 
WS  approximations  are  easily  generated  from  a  minimal  state-space  realization  of  the 
analyzing  wavelet.  Furthermore,  easily  verified  conditions  for  the  minimality  of  such 
realizations  were  derived.  Much  work  remains  to  be  done  to  understand  the  approach 
proposed  here  against  the  background  of  prior  work  on  L2  and  H°°  approximation 
theory  (c.f.  [GLP91b,  GLP90,  GLP91a,  Bar87]). 


67 


Chapter  4 


System  Identification  Using 
Wavelet  System  Models 


In  this  chapter  we  discuss  the  use  of  wavelet  system  representations  for  the  purpose 
of  identification  of  an  unknown  causal  linear  time-invariant  system.  The  problem  of 
identification  may  be  formulated  as  the  determination  of  a  system  model  representing 
the  essential  aspects  of  an  existing  system  (or  a  system  to  be  constructed)  and  pre¬ 
senting  knowledge  of  that  system  in  usable  form  [Eyk74] .  For  a  slightly  more  concrete 
definition  in  the  present  context,  consider  a  linear  system  of  the  form  shown  in  Figure 
4.1,  where  h(-)  is  the  weighting  pattern  (impulse  response),  u(-)  is  the  input,  y(-) 
is  the  output,  and  v(-),  is  a  disturbance  (noise).  In  reference  to  Figure  4.1,  system 


u(t) 


v(t) 


Unknown 

Dynamical 

System 


y(t) 


Figure  4.1:  Open  loop  setting  for  system  identification. 
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identification  deals  with  the  problem  of  constructing  usable  mathematical  models  of 
the  system  h,  based  on  observed  data  ( u  and  y ).  There  exists  a  vast  collection  of 
literature  devoted  to  the  subject  of  system  identification  which  we  do  not  attempt  to 
review  here.  A  number  of  good  survey  papers  and  books  with  extensive  references 
have  been  written  on  various  aspects  of  the  subject  which  we  refer  the  reader  to  for 
further  details  (see  e.g.  [Eyk74,  Lju87,  UR90,  Str81,  Wel81,  You81]).  Here  we  only 
review  some  of  the  basic  philosophy  of  system  identification  with  particular  empha¬ 
sis  on  the  choice  of  model  sets.  We  discuss  in  some  detail  the  role  which  wavelet 
system  representations  can  play  in  system  identification  and  highlight  some  of  the 
resulting  benefits  by  means  of  examples.  Here  we  treat  truncated  WS  representations 
as  linear-in-parameters,  black-box  model  sets  for  system  identification.  A  key  feature 
of  WS  models,  is  time-frequency  localization  which  provides  a  convenient  means  of 
incorporating  time  and  frequency  domain  a  priori  knowledge  in  the  formal  properties 
of  a  parametric  model.  Our  use  of  truncated  WS  representations  is  close  in  spirit  to 
the  use  of  truncated  Laguerre- Fourier  series  as  rational  parametric  models.  Laguerre 
filters  form  a  class  of  orthonormal  bases  for  H2(n+),  and  have  received  considerable 
attention  in  the  areas  of  rational  approximation  and  system  identification.  We  make 
both  qualitative  and  numerical  comparisons  of  some  key  properties  of  W'S  models  and 
Laguerre  models.  The  two  examples  considered  in  the  numerical  comparisons  are: 
(1)  approximation  of  ‘cochlear’  filter  transfer  function  and,  (2)  approximation  of  a 
second-order  system  with  delay.  Numerical  results  of  these  two  examples  suggest  that 
the  performance  of  WS  models  may  be  far  superior  to  that  of  Laguerre  models  for  an 
important  class  of  systems. 

4.1  Overview  of  System  Identification 

Among  the  various  methods  for  system  identification,  a  clear  distinction  may  be  made 
between  parametric  and  non  parametric  techniques.  We  briefly  outline  some  of  the 
essential  ideas  of  both  nonparametric  and  parametric  identification  identification. 
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4.1.1  Nonparametric  Identification  Methods 

As  opposed  to  parametric  methods  which  attempt  to  estimate  parameters  of  some 
given  system  structure,  nonparametric  methods  deal  with  estimating  points  on  an 
unparameterized  system  model.  In  the  time-domain,  nonparametric  identification 
attempts  to  estimate  the  impulse  response  of  an  unknown  system.  The  essential 
relationship  on  which  nonparametric  impulse  response  estimation  is  based  is, 

Ruy(r)  =  (h  *  Ruu)  (r),  (4.1.1) 

where  Ruu  is  the  autocorrelation  function  of  the  input  u,  and  Ruy,  is  the  cross¬ 
correlation  function  of  the  input  u,  and  the  output  y.  Frequency-domain  nonparamet¬ 
ric  identification  deals  with  estimation  of  the  transfer  function  of  an  unknown  system, 
and  is  in  general  based  upon  the  frequency- domain  expression  of  Equation  4.1.1: 

Suy(iu )  =  H(iu)Suu(iu).  (4.1.2) 

In  Equation  4.1.2  Suu  and  Suy  are  the  spectrum  and  cross-spectrum  respectively  ob¬ 
tained  via  the  Fourier  transform  of  Ruu  and  Ruy ,  and  is  the  frequency  response 

of  the  system. 

The  methods  of  nonparametric  identification  are  primarily  concerned  with  the 
estimation  of  the  functions  Ruu  and  Ruy  (time-domain),  or  Suu  and  Suy  (frequency- 
domain).  For  a  survey  of  some  of  these  techniques  see  [Wel81]. 

Due  to  the  current  emphasis  on  control  synthesis  and  design  tools  which  rely  on 
parametric  models,  it  is  often  necessary  to  construct  a  parametric  model  from  the 
results  of  nonparametric  identification.  We  will  discuss  the  use  of  wavelet  system 
representations  for  the  purpose  of  generating  parametric  models  from  nonparametric 
forms  in  Section  4.2.2. 

4.1.2  Parametric  Identification  Methods 

The  basic  methodology  of  parametric  identification  consists  of  the  following  four  steps 
which  are  depicted  in  Figure  4.2  (adapted  from  [Lju87]). 
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Figure  4.2:  Parametric  system  identification  loop. 


Experiment  design  and  collection  of  data.  An  essential  property  of  data  col¬ 
lected  for  the  purpose  of  identification  is  that  it  be  ‘informative  enough’  for  the  con¬ 
struction  of  a  reliable  model.  The  imprecise  notion  of  data  which  is  ‘informative 
enough’  may  be  precisely  formulated  with  respect  to  a  given  model  set  (see  [Lju87]). 
A  well  designed  identification  experiment  is  one  which  results  in  the  collection  of  such 
informative  data. 


Selection  of  a  (parameterized)  model  set.  For  parametric  identification,  a  fam¬ 
ily  of  parameterized  candidate  models  must  be  selected  within  which  the  search  for  a 
suitable  model  is  to  be  made.  It  has  often  been  noted  that  selection  of  an  appropriate 
model  set  is  the  most  important  as  well  as  the  most  difficult  step  of  the  identification 
process.  Choosing  a  good  model  set  for  a  particular  problem  involves  incorporating  a 
combination  of  a  priori  knowledge  and  engineering  intuition  into  the  formal  properties 
of  the  chosen  collection  of  models. 
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Selection  of  identification  method.  A  large  proportion  of  the  system  identifi¬ 
cation  literature  has  been  devoted  to  design  and  selection  of  identification  methods. 
A  key  feature  of  every  identification  method  is  the  particular  criterion  that  is  chosen 
to  make  precise  the  notion  of  which  of  the  models  in  the  model  set  ‘best’  repre¬ 
sents  the  data.  A  very  broad  categorization  of  the  variety  of  proposed  identification 
methods  would  lead  us  to  select  between  techniques  which  treat  the  problem  in  ei¬ 
ther  the  deterministic  or  the  stochastic  settings,  and  also  select  between  time-domain 
and  frequency- domain  methods.  Once  again  we  give  reference  to  the  survey  papers 
[Lju87,  UR90,  Str81,  Wel81,  You81]  for  further  detail. 

Model  validation.  Having  selected  a  model  from  the  model  set  according  to  the 
criterion  of  fit  defined  by  the  selected  identification  method,  model  validation  is  the 
process  of  determining  how  well  the  model  describes  the  unknown  system  based  on 
further  experiments.  Following  model  validation  the  identified  model  is  either  accepted 
or  rejected.  If  the  model  is  rejected,  the  identification  process  may  be  repeated, 
possibly  with  new  data,  a  different  model  set,  and  a  different  identification  method. 

Here  attention  is  largely  restricted  to  the  problem  of  selecting  a  suitable  model 
set  as  this  is  where  wavelet  system  representation  provides  a  new  approach.  We  shall 
consider  only  models  of  the  form  commonly  known  as  black  box  models.  Parameters 
of  a  black  box  model  have  no  real  physical  significance  in  relation  to  the  physical 
system  being  modeled.  Black  box  model  parameters  should  be  viewed  as  a  means  of 
for  approximating  the  input-output  behavior  of  the  system. 

4.2  Parametric  Identification  Using  Wavelet  System 
Model  Sets 

As  mentioned  above,  the  importance  and  difficulty  of  selecting  an  appropriate  model 
set  has  often  been  noted.  The  difficulty  lies  in  first  developing  a  priori  knowledge 
and  engineering  insight  to  recognize  what  are  the  relevant  aspects  of  the  unknown 
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system  that  should  be  captured  by  the  identified  model,  and  then  incorporating  this 
knowledge  into  the  choice  of  a  model  set.  It  is  necessary  that  the  model  set  be  ‘rich’ 
enough  to  capture  the  relevant  aspects  of  the  system  while,  at  the  same  time,  it 
is  important  that  the  complexity  of  the  model  be  kept  as  low  as  possible.  In  this 
section  we  discuss  how  model  sets  which  consist  of  appropriately  truncated  wavelet 
system  representations,  can  be  used  to  incorporate  certain  forms  of  a  priori  knowledge, 
namely  joint  time-frequency  information  about  the  systems’  behavior.  The  manner  in 
which  wavelet  system  representations  are  used  here  is  very  close  in  spirit  to  the  use  of 
truncated  Laguerre  representations  in  system  identification  (c.f.  [Mak90a,  Mak90b, 
Wah91,  Par91].  We  discuss  some  of  the  key  similarities  and  differences  in  these  two 
approaches. 

From  Chapter  3,  we  know  that  any  transfer  function  G  E  Hfl(n+)  may  be  repre¬ 
sented  by  its  WS  decomposition, 

OO 

g(>)  =  E  E  <4-2'3) 

mgZ  n=0 

where  Gm,n(s )  are  the  wavelet  system  transfer  functions  obtained  via  the  wavelet 
decomposition  of  G  with  respect  to  a  real-rational  analyzing  wavelet  $  E  RH2(II+). 
If  we  rewrite  Equation  4.2.3  as, 

OO 

G(s)  —  ^  ]  (%m,0 T  "y  ]  [o^Tn, 71  ^m,n (^)  “h  m,— n (^)] 

771  €  2  n=l 

CO 

=  E  EGm,n(^)>  (4.2.4) 

771 6  2  n— 0 

we  see  that  this  is  precisely  a  parametric  model  for  transfer  functions  in  H^(II+), 
where  the  parameters  are  the  wavelet  expansion  coefficients  {am,n}mej_  neZ+ '  From 
a  computational  point  of  view,  it  is  perhaps  easier  to  separate  the  real  and  imaginary 
parts  of  each  WS  transfer  function  Gm,n(s ),  and  write, 

sfte  Gm'n  =  9te  a  §ie  -  Im  a  Im  ($m,n  -  ^m-n) 

Im  Gm,n  =  5?e  a  Im  ($m.„  +  +Im  a  3ie  , 

since  the  coefficients  and  functions  appearing  in  the  expressions  are  then  real-valued. 
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In  practice  of  course  it  is  necessary  to  select  a  truncation  of  the  model  in  (4.2.4) 
for  use  as  a  model  set  for  identification  purposes.  That  is,  we  consider  model  sets  of 
the  form 

M*  =  \  M(a)  =  J2  Gm'n(s;a)  :  a£Vct2(7L2)l  (4.2.5) 

(  m,n£j  J 

and  then  select  an  appropriate  parameter  estimation  technique  to  estimate  the  pa¬ 
rameters  a  based  on  observed  data.  We  will  not  discuss  any  of  the  possible  choices  for 
parameter  estimation  using  wavelet  system  models.  It  should  be  noted  however  that 
since  the  parameters  in  the  WS  model  appear  linearly,  it  is  possible  to  use  least-squares 
techniques  (see  [Lju87]  for  a  treatment  of  many  of  the  available  choices).  Models  such 
as  this  are  often  referred  to  as  linear-in  -parameters  models. 

As  we  shall  see  below,  it  is  in  the  selection  of  this  truncation,  that  certain  forms 
of  a  priori  knowledge  and  engineering  intuition  may  be  incorporated. 

4.2.1  Incorporating  A  Priori  Knowledge  in  WS  Model  Sets 

Here  we  consider  two  forms  of  a  priori  knowledge  which  are  commonly  used  in  system 
identification:  (1)  knowledge  of  important  frequency  bands  (frequency-domain),  (2) 
knowledge  of  delays  and  time  constants  present  in  the  system  (time-domain).  These 
two  forms  of  a  priori  information  are  often  treated  separately  in  choosing  model  sets 
for  identification.  However,  in  the  case  of  time-frequency  localized  model  sets,  such  as 
given  by  WS  representations,  there  is  a  natural  mechanism  for  treating  time-domain 
and  frequency-domain  information  simultaneously. 

It  is  common  practice  in  system  identification  to  estimate  parameters  using  some 
form  of  frequency  weighting.  That  is,  given  some  a  priori  knowledge  or  intuition 
regarding  the  frequencies  which  are  important  in  the  particular  application,  identifi¬ 
cation  methods  are  designed  so  as  to  emphasize  good  models  over  these  frequencies  or 
frequency  bands.  One  method  of  doing  so  involves  the  use  of  a  frequency  weighting 
function  in  the  cost  function  associated  with  the  particular  identification  method.  For 
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example  a  scaler  cost  function  of  the  form, 


J{9)  =  /  (G(w;  9)  -  G(uf)  C{u)  (g(w;  0)  -  G(w))  du>, 

may  be  used,  where  C(u>)  is  a  frequency  weighting  function  used  to  emphasize  par¬ 
ticular  frequency  bands.  Other  schemes  for  emphasizing  good  modeling  in  particular 
frequency  bands  involve  e.g  prefiltering  of  the  data  (see  [Lju87]). 

Another  problem  of  some  significance  in  system-identification,  is  the  approxima¬ 
tion  of  delays  in  the  system.  Systems  with  delays  are  of  course  infinite-dimensional 
systems  (with  nonrational  transfer  functions).  In  such  cases  finite-dimensional  (ratio¬ 
nal)  approximations  are  required.  A  priori  knowledge  of  time-delays  can  be  incorpo¬ 
rated  into  certain  rational  model  structures  such  as  Laguerre  models  which  we  discuss 
in  Section  4.3,  and  WS  models  as  will  be  discussed  later. 

The  most  important  property  of  wavelet  system  representations  in  the  context  of 
black  box  models,  is  that  they  provide  a  straightforward  means  of  capturing  time- 
varying  frequency  behavior.  Since  each  wavelet  system  transfer  function  Gm,n  is 
localized  in  time-frequency  on  rectangles  <2m,n,  we  can  define  regions  of  time- frequency 
on  which  we  would  like  to  concentrate  our  model  set.  Figure  4.3  graphically  illustrates 
the  various  ways  in  which  a  priori  time-frequency  information  can  be  incorporated 
into  a  model. 

In  this  section  we  discuss  the  use  of  a  priori  knowledge  in  WS  models  from  the  point 
of  view  of  time-frequency  localization.  This  point  of  view  is  useful  in  developing  an 
intuitive  understanding.  In  Section  4.5,  a  priori  knowledge  in  WS  models  is  discussed 
from  the  slightly  more  concrete  point  of  view  of  pole- placement. 

Time-Domain  Information 

Delays:  Consider  a  system  with  impulse  response  h(-)  6  L2(0,oo)  which  includes 
a  delay  r;  i.e. 

h(t )  =  0,  for  t  <  t. 
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Time 


Figure  4.3:  Time- Frequency  weighting  schemes:  (a)  Frequency  weighting,  (b)  Time 
weighting:  Delays  and  time  constants,  (c)  Joint  time-frequency  weighting. 
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Assuming  we  have  some  a  priori  knowledge  or  estimate  of  r,  this  knowledge  can 
be  incorporated  in  a  WS  model  by  only  including  dilation  indices  m  such  that 
Ri'&m.n)  fl[0> r]  —  where  R^m,™)  is  the  time  concentration  of  the  wavelet  ®TO)n 
(c.f.  (3.4.14)).  Such  a  choice  concentrates  the  model  outside  the  time  interval  [0,rj. 

Time  Constants:  Knowledge  of  dominant  time  constants  of  the  system  to  be  ap¬ 
proximated  is  approximately  the  knowledge  of  the  time-concentration  of  the  impulse 
response.  A  priori  information  regarding  time-constants  may  also  be  used  in  the 
selection  of  dilation  indices  m,  to  include  in  the  WS  model. 

Remark:  Note  that  since  dilations  are  used  to  represent  both  delays  and  time  con¬ 
stants,  it  is  reasonable  to  expect  that  there  may  arise  situations  where  these  two  goals 
are  conflicting.  This  is  in  fact  the  case,  as  will  be  shown  in  Section  4.5.  A  similar 
situation  arises  in  the  case  of  Laguerre  models  as  well. 

Frequency-Domain 

Knowledge  of  important  frequency  bands  is  easily  utilized  in  a  WS  model  by  simply 
choosing  translation  indices  n  such  that  the  frequency  concentrations  fl('hTO,n),  ‘cover’ 
the  frequencies  of  intertest. 

Joint  Time-Frequency  Space 

In  the  case  of  physical  dynamical  systems,  the  impulse  response  in  often  best  described 
in  terms  of  its  time-varying  frequency  content.  This  point  is  best  illustrated  by  means 
of  the  “typical”  impulse  response  shown  in  Figure  4.4  (a).  It  is  clear  that  high- 
frequency  components  of  this  impulse  response  are  localized  near  the  origin.  Also 
away  from  the  origin,  we  see  primarily  low-frequency  behavior,  i.e  high  frequency 
components  are  associated  with  short  time  constants  and  longer  time  constants  are 
associated  with  low  frequency  behavior.  This  is  in  fact  the  form  of  time-frequency 
localization  that  arises  in  WS  models  (see  Figure  3.8).  Once  again  this  can  be  used 


Figure  4.4:  Impulse  response  exhibiting  time-varying  frequency  content. 

to  explicitly  determine  the  choice  of  dilations  and  translations  (indices  (m,n))  to  be 
included  in  the  model. 

4.2.2  Fitting  WS  Models  to  Nonparametric  Models 

Wavelet  system  models  can  also  be  used  to  construct  parametric  models  based  on  the 
results  of  a  nonparametric  identification  procedure.  In  particular,  given  a  nonpara¬ 
metric  transfer  function  estimate  Gjvp,  we  can  ‘fit’  a  parametric  WS  model  to  Gjvf 
by  computing  the  decomposition, 

rt  V-''  p^m,n 

VffS  —  /  ,  vr NP • 

In  fact,  the  nonparametric  model  Gnp ,  can  provide  precisely  the  type  of  a  priori 
knowledge  about  the  time-frequency  content  of  system,  that  is  required  to  make  a 
sensible  choice  of  the  index  set  J ,  as  discussed  in  Section  4.2.1.  We  will  illustrate  the 
use  of  this  technique  by  example  in  Section  4.7.1. 


78 


4.3  Laguerre  Decompositions 

Laguerre  functions  have  been  studied  extensively  in  the  contexts  of  system  identifi¬ 
cation  and  approximation  of  infinite-dimensional  systems.  The  use  of  Laguerre  de¬ 
compositions  in  this  context  is  very  close  to  the  spirit  of  our  work  using  wavelet 
system  decompositions  and  it  is  worthwhile  to  make  an  assessment  of  some  of  the  key 
similarities  and  differences. 

We  begin  by  reviewing  some  of  the  basic  properties  of  Laguerre  polynomials  the 
associated  orthonormal  bases  of  L2(0,oo)  (and  consequently  H2(n+)).  A  thorough 
treatment  of  this  subject  may  be  found  for  instance  in  the  book  of  Szego  [Sze39]. 

4.3.1  Laguerre  Polynomials  and  Orthonormal  Bases 

The  classical  Laguerre  polynomials  Lm(t)  are  defined  by, 

Lm{t)=  m~  0,1,...  (4.3.6) 

If  we  define, 

we  have  the  following  well  known  results. 

Theorem  4.1  (Szego[Sze39]) 

(1)  The  Laguerre  functions  {<pm}^_0  form  an  orthonormal  basis  for  L2(0,  oo)  . 

(2)  The  Laguerre  functions  {^m}“=0  are  dense  in  L1(0,  oo). 

(3)  For  all  t  >  0, 

IM*)|<1,  m  =  0,1,2,...  (4.3.7) 

(4)  For  any  fixed  to  >  0, 

max  \(fm{t)\  =  0(m-1/4). 
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As  a  natural  modification  of  the  classical  Laguerre  functions,  let, 

=  \Z^pe~ptLm{2pt),  p>  0;  m  =  0,1,2,... 

Then  it  follows  from  Theorem  4.1  that  the  sequence  {4^n}m=o  is  an  orthonormal  basis 
for  L2(0,oo)  ,  and  that  properties  analogous  to  those  listed  in  the  Theorem  4.1  can 
be  derived.  The  constant  p  >  0  is  known  as  the  Laguerre  shift  parameter ,  for  reasons 
which  will  be  clarified  in  Section  4.6.2. 

Remark:  Note  that  is  simply  a  dilation  of  <f>m  by  a  factor  of  2 p. 

Hence  we  have  the  Fourier  decomposition  of  g  £L2(0,oo)  with  respect  to  the  La¬ 
guerre  basis  {</>£j~=0, 

OO 

(4-3-8) 

m— 0 

Remark:  Although  convergence  of  the  series  (4.3.8)  is  in  general  only  guaranteed 
in  the  L2  sense,  convergence  in  other  norms  (e.g.  ||  •  ||i,  ||  •  ||oo )  can  be  established 
under  suitable  hypotheses  on  g,  as  will  be  mentioned  later. 

Laguerre  Function  Bases  for  H2(n+) 

Taking  the  Laplace  transform  of  let 

*«•)■=  WWW  =  jf ™  =  . («.9) 

By  the  Paley- Wiener  theorem  and  Parseval’s  theorem,  is  an  orthonormal 

basis  for  H2(n+).  Hence  for  any  G  €  H2(n+), 


where  the  coefficients  c^(G),  are  given  by  the  appropriate  inner-products. 

Remark:  The  Laguerre  series  representation  (4.3.10)  is  a  decomposition  of  any 
transfer  function  in  H2(n"*“)  via  rational  transfer  functions  with  a  single  real  pole 
-p  <  0,  with  increasing  multiplicity.  This  fact  may  cause  slow  convergence  of  the 
series  (4.3.10)  for  certain  classes  of  transfer  functions  as  described  in  the  next  section. 
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4.4  A  Priori  Knowledge  in  Laguerre  Models 

We  briefly  review  the  mechanisms  available  for  the  incorporation  of  a  priori  informa¬ 
tion  in  Laguerre  models.  The  essential  forms  of  prior  information  that  may  be  used 
in  Laguerre  models  are:  (l)knowledge  of  time  delays,  and  (2)  knowledge  of  dominant, 
time-constants. 


4.4.1  Representation  of  Delays  in  Laguerre  Models 


It  has  often  been  noted  that  Laguerre  models  are  suitable  for  representing  delays 
(c.f.  [Mak90a,  Par91,  DZP90]).  To  understand  how  time  delay  information  may  be 
incorporated  in  a  Laguerre  model,  let  us  introduce  the  following  equivalent  definition 
of  the  Laguerre  basis  which  is  given  in  terms  of  a  time-domain  shift  operator  (c.f. 
[Mak90a]). 

Define  T 

T  =  (V-pl)(V  +  pl)~l  , 


where  p  >  0,  and  V  is  the  infinitesimal  generator  of  the  semigroup  {Tt}t>o  of  all  right 
shifts  on  L2(0,oo)  ,  i.e. 


(Trf)(t)  =  | 


0 

/(<  -  r) 


for  t  <  r 
for  t  >  t 


and  V,  is  defined  by, 

Vf  =  lim  —  (Tt  -  1)  /, 

t|0  T 

where  the  limit  is  in  the  strong  sense.  Thus  if  /  is  absolutely  continuous  with  /( 0)  =  0, 


V f  =  —df/dt.  The  operator  T  is  called  the  cogenerator  of  the  semigroup  {Tt}t>q.  T 


is  a  shift  operator  in  the  sense  that, 


T  is  an  isometry, 
and  (T*)n  q,  strongly  as  n  ->  oo. 

T  is  called  the  Laguerre  shift  on  L2(0,oo)  ,  and  T  and  T*  are  given  explicitly  by, 

(T /)  (t)  =  /(f)  -  2 p  f  e-p(f~T)  f{r)dr 

Jo 
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(T7)(<)  =  f(t)-2pj\-^f(r)dr 

Now  a  second  equivalent  definition  of  the  Laguerre  functions  may  be  made  in 
terms  of  the  Laguerre  shift  operator  T, 


C(0  =  V^pTme-pt. 


(4.4.11) 


Hence,  the  Laguerre  functions  are  generated  by  right  shifts  of  a  single  function,  where 
the  shift  is  defined  by  the  Laguerre  shift  operator. 

To  see  the  effect  of  the  Laguerre  shift  operator  from  a  slightly  different  viewpoint 
[DZP90],  note  that 


1  —  ST 


e~ST  =  Urn  (- — .  (4.4.12) 

N-*oo  V  1  +  st/2N  J 

Thus,  the  all-pass  factors  of  degree  m  in  the  Laguerre  filters  $^(5),  can  provide  a 
good  representation  of  a  time  delay  r.  From  (4.4.12),  we  see  that  the  pole  location  p 
of  the  Laguerre  model  should  be  interpreted  as  p  —  2 N/t  for  a  good  representation 
of  a  time  delay  r  by  a  Nth  order  Laguerre  model.  This  is  precisely  the  mechanism 
available  for  the  incorporation  of  a  priori  knowledge  of  time-delays  in  a  Laguerre 
model.  Namely  the  pole  location  p  and  model  order  N  may  be  selected  so  as  to 
provide  a  good  representation  of  the  delay  r. 


4.4.2  Use  of  Time  Constant  Information  in  Laguerre  Models 

As  noted  earlier  a  Laguerre  model  has  only  a  single  realleft  half  plane  pole  at  —p.  Thus 
in  representing  transfer  function  G  with  a  single  dominant  pole,  it  is  reasonable  to 
expect  that  knowledge  of  the  dominant  pole  of  G  can  be  incorporated  into  a  Laguerre 
model  via  the  selection  of  the  Laguerre  pole  —  p.  It  is  also  reasonable  to  expect  slow 
convergence  of  the  series  (4.3.10)  if  G  has  a  highly  resonant  dominating  pole. 

To  make  the  above  statements  more  precise,  consider  the  linear  fractional  trans¬ 
formation, 

s  +  p  z  +  1 
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which  is  a  conformal  mapping  of  the  left  half-plane  II-,  onto  the  open  unit  disk.  Let 
G  be  a  finite-dimensional  transfer  function  with  poles  {7^}.  It  can  be  shown  (see  e.g. 
[Wah91])  that  in  this  case,  the  rate  of  convergence  of  the  Laguerre  series  (4.3.10)  is 
governed  by  the  magnitude  of  the  2-plane  poles, 


7  k  +  P 
lk-p ’ 


(4.4.13) 


of  the  discrete-time  system  G  (p(z  +  1  )/(z-  1)).  In  particular  convergence  will  be 
slow  for  2-plane  poles  close  to  the  unit  circle.  For  any  given  pole  7*.  of  G(s), 


P*  =  \lk\  =  argmin 


Ik  +  P 
lk~P 


The  minimum  value  is  determined  by, 


7  k  +  \*fk  1  2 
Ik  -  |7fcl 


l7fc|  +  Me 

I7&  |  ~  7 k 

1  +  cos($) 

1  —  cos(#) 


cot2(0/2), 


where  7*.  =  \jk\  et9.  Hence,  the  minimum  value  is  given  by, 


=  |cot(arg(7*)/2)| .  (4.4.14) 

Thus  there  exists  a  notion  of  an  ‘optimal’  choice  of  the  Laguerre  shift  parameter  p 
to  capture  the  effect  of  dominant  poles.  Consequently,  we  may  make  the  following 
observations  regarding  the  the  choice  of  p. 


7  k  +  P* 


7  k  ~  V 


(1)  For  a  high  rate  of  convergence  of  the  Laguerre  series,  p  should  be  chosen  such  that 
1/pis  close  to  the  dominating  time-constants  of  the  system  to  be  approximated. 

(2)  If  the  system  to  be  approximated  possesses  highly  resonant  dominant  poles,  i.e. 
poorly  damped  complex  poles,  convergence  of  the  series  will  be  slow,  since  the 
Laguerre  pole  is  on  the  real-axis.  This  can  also  be  seen  from  (4.4.14),  noting 
that  for  a  highly  resonant  pole  7^.,  arg(7fc)  ~  7t/2. 

(3)  If  the  system  G  has  a  wide  range  of  time-constants,  resulting  in  the  poles  being 
scattered,  convergence  will  be  slow.  This  is  because  in  this  case,  for  any  choice 
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°f  p,  |7 k  —  P |  will  be  large  for  some  k ,  and  consequently  there  will  be  z-plane 
poles  close  to  the  unit  circle. 

Thus  knowledge  of  dominant  time- const  ants  may  be  incorporated  in  a  Laguerre 
model  via  the  choice  of  the  Laguerre  shift  parameter  p.  Clearly  difficulties  with  the 
rate  of  convergence  arise  when  there  is  more  than  one  time-constant  involved  and 
these  are  widely  separated. 


Extended  Laguerre  Series  and  Frames 


It  has  been  suggested  [Wah91]  that  for  systems  with  m,  dispersed  time-constants,  the 
rate  of  convergence  may  be  improved  by  considering  models  of  the  form, 


g(s)  «  ffjr  cw(G)^S.  /' 

Hto  v+Pi) 


(4.4.15) 


where  the  a  priori  knowledge  of  time  constants  has  been  incorporated  in  the  choice 
of  the  m  parameters  p3.  It  was  also  suggested  in  [Wah91]  that  improved  convergence 
rates  for  systems  with  highly  resonant  poles  may  be  obtained  by  replacing  the  Laguerre 
filters  with  the  so-called  Kautz  filters  which  have  a  single  pair  of  complex  poles. 

The  improved  Laguerre  model  (4.4.15),  must  be  treated  as  an  ad  hoc  formulation 
if  one  is  restricted  to  the  setting  of  orthonormal  bases.  However  it  is  possible  to  make 
sense  of  such  a  formulation  using  frame  theory  as  is  stated  in  the  following  theorem. 


Theorem  4.2  Let  {p\, . .  .,Pk},  be  a  finite  collection  of  positive  constants,  and  let 
,  denote  the  mth  order  Laguerre  function  with  Laguerre  shift  parameter  p3 .  Then 
the  doubly-indexed  sequence  of  functions  j  —  1,...,  K\  m  —  0, 1, 2, . .  is  a 

tight  frame  for  H2(n+)  with  frame  bounds  A  =  B  =  K . 

Proof:  Since  for  any  fixed  j,  }m=o>  is  an  orthonormal  basis  for  H2(n+), 

00  i2 

X]  =  j|F||2,  vf  e  H2(n+),j  = 

771=0 

Thus 

K  00 

££|<c«)|  =Avf.  . 

1  m— 0 
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Hence  any  F  £  H2(n+),  lias  the  representation, 

K  oo 

f  =  YTl(F’s~1*%)iS& 

j—1  m=Q 

K  oo 

=  K-1  Y  Y  ( F *  *&)  (4.4.16) 

j= 1  m=0 

We  shall  refer  to  the  representation  (4.4.16)  as  an  extended  Laguerre  series  (ELS) 
representation.  Clearly  the  ELS  representation  is  not  unique. 

Remark:  It  is  convenient  at  this  point  to  note  an  important  similarity  between 
the  extended  Laguerre  sedries  (4.4.16)  and  wavelet  system  representations.  As  noted 
earlier  (see  (4.3.9))  is  simply  a  dilation  of  the  function  by  a  factor  of  2 p. 
Furthermore  in  the  last  section  it  was  shown  that  the  Laguerre  basis  may  be  viewed 
as  being  generated  via  right  shifts  of  a  single  function,  where  the  shift  is  defined  by 
the  Laguerre  shift  operator.  Consequently  the  extended  Laguerre  series  (4.4.16)  is  a 
representation  via  dilates  and  shifts  of  a  single  function,  where  the  dilation  operator 
is  as  defined  in  Chapter  2,  but  the  shift  operator  is  now  the  Laguerre  shift. 

For  Laguerre  models,  all  a  priori  knowledge  must  be  used  in  selection  of  the  single 
parameter  p.  In  particular  for  representation  of  a  delay  r  ,  we  would  like  to  choose 
p  =  2 N/r,  where  N  is  the  order  of  the  Laguerre  model,  and  knowledge  of  a  dominant 
time-constant  T,  leads  us  to  choose  p  =  1/T.  Clearly  there  may  arise  situations  where 
these  two  goals  are  conflicting,  for  example  in  the  case  of  a  long  time- delay  r,  and  a 
small  time  constant  T,  or  vice-versa.  Furthermore,  it  is  clear  that  there  is  no  direct 
mechanism  for  the  use  of  frequency  information  in  Laguerre  models. 

4.5  A  Priori  Knowledge  in  WS  Models  Revisited 

In  light  of  the  description  of  a  priori  knowledge  in  Laguerre  models,  let  us  re-examine 
the  use  of  a  priori  knowledge  in  WS  models  from  the  somewhat  more  concrete  point 
of  view  of  the  poles  of  the  model.  Note  that  selection  of  the  Laguerre  shift  parameter 
p,  is  actually  a  process  of  modifying  the  basis  functions  to  suit  the  problem  at  hand. 
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A  similar  technique  may  be  used  in  the  case  of  WS  models,  as  is  discussed  in  Section 

4.5.3. 

4.5.1  Time  Delays  and  Time  Constants 

As  mentioned  in  Section  4.2.1,  the  effects  of  a  time-delay  r,  can  be  captured  in  a 

wavelet  system  model  via  the  dilations.  To  make  this  statement  more  precise,  consider 

the  dilation  operator  Da,  on  L2(0,oo)  ,  as  defined  in  Section  2.2.  In  particular,  for 

WS  models,  the  appropriate  dilations  in  the  time  domain  are  of  the  form  D  -m.  The 

a0 

adjoint  _D*_m,  of  this  operator  is  simply  Dam ,  since 
a0  0 

(■ £>«-"*/,  <7 )  =  jH 

=  JQ  f(t)ao/29(aolt)dt  =  (/,  Da™g)  . 

Therefore  the  dilation  operator  is  not  a  shift  operator  in  the  same  sense  as  the  Laguerre 
shift  operator  T,  since  both  Da-m  and  J9*_m  are  (L2(0,oo)  )  isometries.  However,  we 

ao  a>Q 

can  still  view  Da-m  as  a  shift  operator  in  the  following  sense 

(Da-mtp)  ( t )  — ►  0  asm-+co  Vt  £  [0,ti],  t\  >  0. 

iD:  ,-*)  w  —  o  as  to  ->  oo  Vf  £  [t\,  oo],  t\  >  €  >  0, 

where  ^  is  the  inverse  Laplace  transform  of  an  admissible  H2(n+)  analyzing  wavelet. 
This  shifting  effect  of  the  dilation  operator  is  perhaps  clearer  if  we  look  at  time  on  a 
logarithmic  scale  since 

aom/,2V>(l°g aomt)  =  aom/2V>(logf  -  mlogao), 

and  we  know  that  ^(0)  =  0,  by  admissibility. 

For  a  given  delay  r,  the  dead- time  can  be  approximated  arbitrarily  well  by  a  WS 
model.  To  see  this  let  m,  be  the  smallest  integer  such  that  a Q7nr  <  1.  We  can  assume 
without  loss  of  generality  that  \fp(t)\2  <  et,  for  t  <  1,  and  e  >  0.  Recall  that  by  the 
admissibility  condition,  j^(f)|2,  must  decay  at  least  as  fast  as  t,  near  t  =  0.  Thus, 

f  |a0  mf)|  dt  =  f  |^(f)|2  dt 

J  o'  *  Jo 
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fa0  T  n  €  /  \  2 

<  e  J  t2dt  —  -  i[aQmrJ  — *  0  as  m  —*■  oo. 

So  for  good  representations  of  the  delay  r,  we  may  select  dilation  indices  m,  such  that 

logr 

m  >  - - . 

log  a o 

As  in  the  case  of  Laguerre  models,  there  is  a  tradeoff  between  the  representation 
of  long  delays  and  small  time  constants.  The  manner  in  which  this  tradeoff  arises  is 
precisely  the  same  in  the  WS  case  as  it  was  in  the  Laguerre  case.  This  follows  from 
the  observation  in  Section  4.3.1  that  the  Laguerre  function  §pm  is  simply  a  dilation 
of  by  a  factor  of  2 p.  Hence  in  both  cases,  dilations  are  chosen  to  represent  time 
delays.  The  effect  of  dilations  on  time  constants  is  easily  seen  by  observing  that  if 

| /(f)!  <  Ce~k\  k>  0 
then  \f(at)\  <  Ce~kat ,  k,a>  0. 

Hence  for  small  dilation  factors  a,  as  are  required  by  both  WS  and  Laguerre  models  for 
good  representations  of  long  delays,  the  time  constant  is  large.  Therefore  convergence 
of  the  WS  series  will  in  general  be  slow  in  cases  where  the  time  constants  are  small 
and  the  delay  is  long. 

It  is  clear  from  the  above  discussion  that  a  priori  information  about  a  dominant 
time  constant  T  may  be  incorporated  in  WS  models  by  selecting  dilation  indices  rn, 
such  that, 

cio  mT*  «  T, 

where  T$,  is  the  dominant  time  constant  of  the  analyzing  wavelet  T.  This  of  course 
corresponds  to  placing  the  poles  of  the  WS  model  on  specific  vertical  lines  in  the 
left  half-plane.  Note  that  WS  models  allow  us  to  incorporate  several  scattered  time 
constants  in  this  manner  as  well. 

4.5.2  Dominant  Frequencies 

Prior  knowledge  of  the  frequencies  associated  with  various  time  constants  may  also 
be  utilized  in  WS  models  by  appropriate  selection  of  dilation  indices  n.  Approximate 
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knowledge  of  time  constants  and  associated  frequencies  corresponds  to  approximate 
knowledge  of  the  pole  locations.  Using  dilations  and  translations  together,  poles  of  a 
WS  model  may  be  placed  close  to  the  approximate  poles  of  the  system.  In  fact  the 
poles  of  the  WS  model  may  be  placed  so  that  they  cover  a  region  in  which  the  poles 
of  the  true  system  are  thought  to  be. 


4.5.3  Optimizing  the  Analyzing  Wavelet 


Thus  far  we  have  not  really  utilized  the  full  flexibility  of  frame  theory  other  than 
to  construct  decompositions  in  terms  of  rational  analyzing  wavelets.  The  underlying 
structure  of  a  wavelet  system  decomposition  consists  of  the  analyzing  wavelet  the 
dilation  stepsize  ao,  and  the  translation  stepsize  bo-  There  exists  a  great  deal  of 
freedom  in  the  choice  of  the  triple  (^aoj&o),  within  the  restrictions  of  admissibility 
and  rationality  of  'f  and  the  sequence  {'®rm)n}  forming  a  frame.  Thus  we  can  consider 
the  problem  of  optimizing  (\H,  ao,  &o )>  so  as  to  suit  the  particular  problem  at  hand.  As 
mentioned  earlier,  this  is  essentially  the  idea  in  choosing  the  Laguerre  shift  parameter 
p,  for  the  Laguerre  basis. 

For  the  sake  of  convenience  and  to  clarify  the  discussion,  we  restrict  discussion 
to  the  case  where  the  analyzing  wavelet  'L  is  of  McMillan  degree  N  =  2,  and  has  no 
zeros.  Let  {7,7}  denote  the  poles  of  4r.  Hence  the  poles  of  the  WS  transfer  function 
Gm'n,  are 


aom  (l  +  inb0)  a0m  +  inb0) 

aom(7  -  inb0 )  %m(j  -  inb0 ) 

Assume  that  we  have  a  priori  knowledge  of  a  pole  u  of  the  system  to  be  approx¬ 
imated.  Then  we  can  try  to  reflect  this  knowledge  in  a  WS  model  by:  (1)  selecting 
dilation  indices  m  such  that, 


-m  „  V_ 

0  JRe  7’ 

and  (2)  selecting  translations  corresponding  translation  indices  n,  such  that 


(4.5.17) 


a0mnb0  ~  Im  v  —  a0  mlm  7. 


(4.5.18) 
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In  general  (4.5.17)  and  (4.5.18)  will  have  only  approximate  solutions  m  and  n.  How¬ 
ever  we  could  also  obtain  exact  solutions  by  modifying  the  frame  $ m>n  in  the  following 
manner.  Assume  a0  is  fixed. 

(1)  Select  the  pole  7,  of  the  analyzing  wavelet  such  that 

5?e  7  =  v.  (4.5.19) 

Note  that  this  involves  simply  dilating  the  analyzing  wavelet  '1.  and  therefore 
causes  no  problems  with  admissibility  or  forming  frames. 

(2)  Select  the  translation  stepsize  bo,  such  that  (41,  ao,  bo),  generates  an  affine  frame, 
and 

n*  =  &Q 1  (Im  v  —  I  my)  ,  (4.5.20) 

for  some  integer  n*.  Note  that  there  always  exists  such  solutions  to  (4.5.20) 
since  by  Theorem  2.6  there  exists  Bc,  such  that  (4>,ao,&o)>  generates  an  affine 
frame  for  every  bo  G  (0,  Bc).  Thus  , 

60  =  —  (Im  u  -  Im-y)  G  (0,  Bc) 

n 

for  |n|  large  enough. 

This  procedure  places  a  pole  of  G0,n  at  v. 

Multiple  time  constants  (Ti,...,Tfc),  may  be  handled  by  looking  for  solutions 
(m1,...,mfc),  to 

r- - — log(— Ke  j/Tj\  =  mj,  '  (4.5.21) 

log  ao 

In  general  the  above  equations  may  have  no  exact  solution  and  will  therefore  have  to 
be  solved  in  an  approximate  (e.g.  least  squares)  sense.  Another  approach  would  be 
to  match  the  longest  time  constant  exactly  according  to  the  method  described  above 
for  a  single  time  constant,  and  then  solve  (4.5.21)  approximately  to  represent  the 
remaining  time  constants.  Multiple  frequencies  associated  with  the  time  constants 
may  be  handled  analogously. 
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A  further  level  of  optimization  would  be  to  tailor  the  form  of  the  actual  analyzing 
wavelet  to  suit  the  problem.  Here  again  the  flexibility  of  frame  theory  can  be  utilized. 
We  will  not  discuss  this  approach  here,  but  give  reference  to  [CW90]  for  a  discussion 
of  these  types  of  ideas. 

4.6  WS  versus  Laguerre  Decompositions 

There  exist  many  similarities  and  as  well  as  differences  between  wavelet  system  models 
and  Laguerre  models.  In  this  section  we  highlight  some  of  these  similarities  and 
differences.  A  key  similarity  between  wavelet  system  decompositions  and  Laguerre 
decompositions  of  H|l(n+),  is  that  they  both  provide  representations  of  H|l(n+) 
in  terms  of  rational  functions.  Hence  truncations  of  either  of  these  representations 
give  rational  (H2)  approximations  to  transfer  functions  in  H|l(n+).  We  restrict  the 
following  discussion  to  the  class  Hjl(n+),  as  for  all  practical  purposes  this  is  the  class 
of  interest  when  dealing  with  real  physical  systems  with  transfer  functions  in  H2(n+). 

4.6.1  Parallel  versus  Series  Decompositions 

There  exists  an  essential  structural  difference  between  wavelet  system  decompositions 
and  Laguerre  decompositions  of  H|l(n+).  Wavelet  system  decompositions,  as  noted 
in  Chapter  3,  are  inherently  parallel  decompositions  in  terms  of  finite-dimensional 
systems  in  the  sense  that  every  term  in  the  series  is  a  rational  function  of  degree 
less  than  or  equal  to  2 N  where  N  is  the  McMillan  degree  of  the  analyzing  wavelet 
4’  6  RH2(n+).  Figure  4.5  graphically  illustrates  this  parallel  decomposition.  Laguerre 
decompositions  (4.3.10),  when  viewed  as  parallel  decompositions,  consist  of  terms  with 
unbounded  MacMillan  degree  (see  Figure  4.6).  It  is  perhaps  more  natural  to  think 
of  Laguerre  decompositions  as  series  decompositions  via  rational  functions  of  fixed 
degree,  in  particular  degree  1  (see  Figure  4.7).  This  structural  difference  is  important 
both  from  the  point  of  view  of  identification  and  approximation.  In  the  case  of  system 
identification,  we  give  the  following  argument  in  favor  of  the  parallel  decomposition 
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Figure  4.5:  Parallel  decomposition  via  WS  representations. 


structure  of  wavelet  system  representations.  Let, 

Gws(s-,a*)=  (4-6-22) 

(m,n)£j 

be  the  ‘identified’  wavelet  system  model  of  a  unknown  system  G,  i.e.  a*  is  the  pa¬ 
rameter  vector  obtained  via  a  chosen  identification  method.  Similarly  let, 


be  the  identified  Laguerre  model  of  the  same  system.  Let  Vws{°*)  denote  the  set  of 
indices  of  zero  (or  negligible)  components  of  the  parameter  vector  a*,  i.e. 

%s(a*)  =  {(ra»»)  :  am,n  <  e;  ("»,»)€  j},  €  >  0. 

Similarly  for  the  Laguerre  model  let, 

£>l(c*)  =  {m  :  |C|  <  e;  m  G  {0, 1, . . . ,  n  -  1}},  e  >  0. 

Now  we  can  ask  the  the  question;  is  it  possible  to  further  reduce  the  order  of  the  model 
given  knowledge  of  the  zero  (negligible)  coefficients  ( Vws{& *)  an(f  ^l(c*)),  without 
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Figure  4.6:  Laguerre  representations  in  parallel  form 


adversely  affecting  the  quality  of  the  model?  Clearly,  in  the  case  of  the  series  structure 
of  the  Laguerre  model,  this  is  directly  possible  if  and  only  if  n  —  1  6  T>l(c*).  In  that 
case  the  Laguerre  model  order  is  reduced  at  least  from  n  to  n—  1.  However,  in  the  case 
of  the  WS  model,  the  decoupled  nature  of  the  parallel  structure  allows  us  to  reduce 
the  model  order  by  simply  removing  any  terms  Gm,n  with  indices  (m,  to)  6  Vws{&*)- 
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The  resulting  decrease  in  model  order  is 


N  x  |{(m,n)  £  T>ws(,a*)  ■  n  =  0} |  +  2N  x  |{(m,  n)  £  Vws(ot*)  :  n  ^  0}| , 

where  N  is  the  McMillan  degree  of  the  analyzing  wavelet.  (Assuming  the  minimality 
conditions  of  3.4.5  are  satisfied.) 

Hence  in  the  case  of  WS  models,  it  is  possible  to  start  with  high-order  models, 
and  then  reduce  the  model  order  after  parameters  have  been  estimated  by  simply 
removing  terms  corresponding  to  negligible  parameter  values. 

4.6.2  A  Priori  Knowledge  in  WS  Models  versus  Laguerre  Models 

A  crucial  basis  for  comparison  of  WS  models  and  Laguerre  models,  is  the  manner  in 
which  each  of  these  two  techniques  facilitate  the  incorporation  of  a  priori  information. 
A  priori  knowledge  in  these  models  was  discussed  in  in  Sections  4.2.1  and  4.5,  in  the 
WS  case  and  in  Section  4.4  for  Laguerre  models.  The  following  is  a  qualitative  list 
comparing  a  priori  knowledge  in  WS  models  with  a  priori  knowledge  in  Laguerre 
models. 

(1)  Both  models  facilitate  the  use  of  time  constant  and  delay  information.  Further¬ 
more  both  models  suffer  from  a  tradeoff  between  representing  long  delays  and 
representing  short  time  constants. 

(2)  There  exists  no  mechanism  for  incorporating  frequency  information  such  as 
imaginary  parts  of  poles  in  Laguerre  models.  Such  information  may  be  used 
in  WS  models  in  the  selection  of  translations. 

(3)  WS  models  can  incorporate  knowledge  of  multiple  time  constants  while  Laguerre 
models  may  accurately  represent  only  one. 

(4)  The  only  parameter  of  a  Laguerre  model  is  the  Laguerre  shift  parameter  p.  This 
is  the  primary  cause  of  any  shortfalls  of  the  Laguerre  models.  WS  models,  on 
the  other  hand,  are  extremely  flexible.  In  selecting  a  WS  model,  one  has  the 
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freedom  to  choose:  (1)  dilation  and  translation  indices,  (2)  frame  parameters 
ao,  and  bo,  (3)pole  locations  of  the  analyzing  wavelet,  and  (3)  the  form  of  the 
analyzing  wavelet.  This  flexibility  can  be  utilized  to  adapt  the  model  to  a  given 
problem. 

4.6.3  Frames  versus  Orthonormal  Laguerre  Bases  in  H2(fl+) 

Of  some  interest  in  this  comparison  of  WS  models  and  Laguerre  models,  are  the 
consequences  of  using  frames  rather  than  orthonormal  bases.  The  two  formulations 
can  be  discussed  in  terms  of  ease  of  computation  and  ‘robustness’  of  representation. 

Given  a  particular  (known)  transfer  function  in  H|l(II+),  in  general  it  is  easier  to 
compute  the  decomposition  with  respect  to  an  orthonormal  basis,  than  the  decomposi¬ 
tion  with  respect  to  a  frame.  In  the  specific  case  of  Laguerre  bases,  the  computation  is 
even  further  simplified  by  the  well  known  recursion  formula  for  Laguerre  polynomials, 
i.e. 

T  2k-l-t  T  1  -  k  T  L  ^  0 
Lk{t)  —  - £  jk  2?  k  >  2, 

with  L0(t)  =  1  and  L\(t)  —  l  -  t.  For  frames  in  general  it  is  necessary  to  compute 
the  decomposition  via  the  inverse  frame  operator,  except  for  in  the  case  of  tight 
frames  where  the  computation  is  equivalent  to  that  for  a  general  orthonormal  basis. 
However,  for  identification  purposes,  there  is  little  or  no  computational  advantage  in 
using  orthonormal  systems  since  the  parameters  need  to  estimated  via  an  identification 
method  and  not  computed  directly  from  knowledge  of  the  transfer  function.  A  possible 
exception  to  this  is  in  the  case  where  the  model  is  is  fit  to  a  nonparametric  model  by 
explicitly  computing  the  expansion  coefficients  as  in  [CW92]. 

From  the  discussion  in  Section  2.3.1,  redundant  frame  representations  are  more 
robust  with  respect  to  perturbations  in  the  coefficients  than  are  orthonormal  basis 
representations.  From  the  point  of  view  of  identification,  this  is  an  important  property 
since  parameter  estimates  are  subject  to  inaccuracies  including  the  effects  of  numerical 
computation. 
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4.6.4  Convergence  In  Other  Norms 

One  of  the  important  properties  of  Laguerre  bases  is  the  property  of  being  uniformly 
bounded  bases  (4.3.7).  Uniform  boundedness  of  the  Laguerre  bases  is  important  in 
establishing  conditions  for  convergence  of  the  partial  sums  of  the  Laguerre  series 
in  norms  other  than  H2(II+).  Of  particular  significance  for  the  purpose  of  robust 
control  synthesis  is  convergence  in  the  H°°(II+)norm.  Convergence  results  in  H°°(II+), 
and  L1(0,oo),  with  emphasis  on  the  rate  of  convergence  are  discussed  in  [Mak90a, 
Mak90b,  GLP91b,  Mak91],  for  Laguerre  series  representations.  As  yet  we  have  no 
analogous  results  for  WS  representations,  although  there  is  reason  to  believe  that 
suitable  smoothness  and  decay  hypotheses  should  lead  to  similar  results. 

4.7  Examples:  WS  vs  Laguerre  Decomposition 

To  illustrate  some  features  of  the  comparison  between  Laguerre  and  WS  models,  we 
consider  the  following  two  examples.  In  these  examples,  we  use  the  H2(II+)  ana¬ 
lyzing  wavelet  of  Section  3.3.1,  Example  1,  with  7  =  5.0,  £  =  1.0,  and  frame  pa¬ 
rameters  do  =  2.0,6o  =  8.3.  The  corresponding  frame  bounds  are  approximately 
A  =  4.6,7?  =  6.6,  which  gives  B/A  =  1.4.  The  examples  discussed  below  are  based 
on  approximating  frequency  responses  using  both  the  Laguerre  and  WS  models.  The 
method  used  to  construct  WS  approximations  in  these  examples  is  as  follows.  First 
a  WS  approximation  is  first  computed  using  a  high-order  model.  The  lower-order 
approximations  are  then  computed  by  sequentially  selecting  terms  with  the  largest 
coefficients  in  the  high-order  approximation,  and  then  refitting  the  corresponding  ra¬ 
tional  functions  to  the  data  using  singular  value  decomposition  to  solve  the  appropriate 
normal  equations  (c.f.  remarks  at  the  end  of  Section  2.4). 

4.7.1  Rational  Modeling  of  Cochlear  Filters 

Within  the  mammalian  inner  ear  lies  a  key  component  of  auditory  system  known  as 
the  cochlea.  The  cochlea  performs  much  of  the  initial  processing  of  acoustic  signals. 
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The  basilar  membrane  is  a  structure  contained  within  the  cochlea  which  roughly 
speaking  performs  a  frequency  analysis  of  the  signals.  The  basilar  membrane  is  often 
modeled  as  a  bank  of  linear,  constant-Q,  bandpass  filters.  As  a  result  of  physiological 
experiments,  nonparametric  models  have  been  constructed  to  describe  the  frequency 
response  of  these  cochlear  filters  (see  [YWS92]).  For  convenience  in  working  with  these 
filters,  parametric  models  are  used  to  approximate  the  frequency  response.  These 
models  may  be  nonrational.  However,  it  is  also  of  considerable  interest  to  construct 
integrated  circuit  implementations  of  these  filters  (c.f.  [Mea89,  LKTS92]).  For  circuit 
implementations  of  the  cochlear  filters,  rational  models  are  required.  Figure  4.8,  shows 
the  frequency  response  of  one  of  these  filters. 


Figure  4.8:  Frequency  response  of  a  cochlear  filter. 

In  this  Section  we  apply  both  WS  models  and  Laguerre  models  to  approximating 
the  frequency  response  of  the  cochlear  filters,  and  compare  the  results.  For  this 
example,  1000  samples  of  the  frequency  response  from  u  =  0,  to  u>  =  50,  were  used  in 
the  approximation. 
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WS  Approximation  of  Cochlear  Filters 


The  nonparametric  model  shown  in  Figure  4.8,  provides  precisely  the  forms  of  fre¬ 
quency  and  time  domain  a  priori  information  required  to  choose  the  initial  dilation 
and  translation  indices  to  be  used  in  a  WS  model.  Having  selected  the  appropriate 
dilation  and  translation  indices,  the  resulting  WS  model  (which  may  be  of  fairly  high 
order)  can  then  be  fit  to  the  data.  Figures  4.9-4.10  show  the  results  of  this  process. 
Figure  4.9  are  three-dimensional  and  density  plots  of  the  magnitude  of  the  coefficients 
of  the  resulting  rational  approximation  to  the  cochlear  filter.  The  approximation  to¬ 
gether  with  the  original  function  and  impulse  response  are  shown  in  Figure  4.10.  It  is 
clear  that  the  number  of  ‘significant’  coefficients  is  small.  Thus  to  reduce  the  model 
order,  we  can  systematicly  select  truncations  of  this  high-order  model  by  eliminating 
terms  with  small  coefficients.  In  doing  so,  we  generate  a  sequence  of  rational  approx¬ 
imations  to  the  transfer  function  of  the  cochlear  filter.  Figures  4.11  -  4.13  show  the 
results  of  this  process  for  different  model  orders  N.  The  normalized  (L2)  approxi¬ 
mation  error  is  plotted  as  a  function  of  model  order  in  Figure  4.17.  It  is  clear  that  we 
can  select  a  reasonably  low-order  rational  approximation  to  the  cochlear  filter  based 
on  the  approximation  error.  However,  we  can  also  ask  the  question;  what  are  the  key 
properties  of  the  cochlear  filters  which  should  be  captured  by  the  model  in  order  to 
call  it  a  ‘good’  approximation.  It  has  been  suggested,  (c.f.  [Sha85a,  Sha85b])  that 
three  important  properties  of  the  the  cochlear  filters  are:  (i)  location  of  the  peak 
magnitude  of  the  frequency  response,  (ii)  the  sharp  cutoff  for  high  frequencies,  and 
(iii)  rapid  decay  in  the  time  domain.  Due  to  the  sharp  cutoff  for  high  frequencies,  the 
decay  in  the  time  domain  is  governed  by  how  the  frequency  response  behaves  as  u 
goes  to  zero.  From  Figures  4.11  -  4.13,  we  see  that  all  of  these  three  properties  can 
be  captured  by  rational  WS  approximations  of  fairly  low  order. 


97 


0.2 


■  ....  i  .  .  .  ■  .  .  ■  .  .  . — i  i  .  .  .  .I,,  i  .  - . -  ■  .1 . 

0  10  20  30  40  50  60 

Translations 


Figure  4.9:  Magnitude  of  coefficients  in  WS  approximation  to  cochlear  filter.  Three- 
dimensional  plot  (top)  and  density  plot  (bottom) 
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Figure  4.10:  High  order  WS  approximation  to  cochlear  filter  frequency  response. 
Laguerre  Approximation  of  Cochlear  Filters 

Laguerre  models  may  also  be  used  for  rational  approximation  of  the  cochlear  filter 
frequency  response.  The  Laguerre  shift  parameter  p,  was  selected  via  a  manual  op¬ 
timization  for  a  Laguerre  model  of  order  N  =  10.  Figures  4.14  -  4.16,  show  the 
resulting  Laguerre  approximations  for  different  models  orders.  As  should  be  ex¬ 
pected,  the  Laguerre  model  demonstrates  slow  convergence  in  this  case  due  to  the 
resonant  components  of  the  system.  Moreover,  it  is  evident  that  a  very  high  order 
Laguerre  model  is  required  to  capture  the  three  important  properties  of  the  system 
mentioned  above. 

Discussion 

Figure  4.17  is  a  plot  of  normalized  approximation  error  versus  model  order  for  both 
the  Laguerre  and  WS  model  approximations  to  the  cochlear  filter  frequency  response. 
It  is  clear  from  this  plot  that  the  WS  models  provide  higher  quality  approximants 
at  low  model  orders.  The  error  of  the  Laguerre  approximant  decreases  exponentially 


99 


0.2 
0.1 
0 

-0.1 
-0.2 

0  50  0  50 

N  =  8 

0.025 
0.02 
0.015 
0.01 
0.005 
0 

Figure  4.11:  WS  approximation  of  cochlear  filter  for  N  =  4  (top),  and  N  =  8  (bottom). 
Solid  curve:  approximation;  Dashed  curve:  nonparametric  model 


100 


Figure  4.12:  WS  approximation  of  cochlear  filter  for  N  =  16  (top),  and  N 
(bottom).  Solid  curve:  approximation;  Dashed  curve:  nonparametric  model 


Figure  4.13:  WS  approximation 
(bottom).  Solid  curve:  approxim; 


Impulse  Response 


N  =  60 


cochlear  filter  for  N  —  28  (top),  and  N 
n;  Dashed  curve:  nonparametric  model 
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Figure  4.15:  Laguerre  approximation  (p  =  8.4)  of  cochlear  filter  for  N  —  36  (top),  and 
N  =  52  (bottom).  Solid  curve:  approximation;  Dashed  curve:  nonparametric  model 
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Figure  4.17:  Normalized  approximation  error  versus  model  order  for  approximation 
of  cochlear  filter  frequency  response  using  WS  and  Laguerre  models 

with  model  order,  but  the  constant  factor  in  the  exponential  is  small  (as  is  indicated 
by  the  slope  of  the  curve  on  a  logarithmic  scale).  Although  the  error  decreases  quite 
rapidly  for  the  WS  approximations,  there  is  no  clearly  defined  exponential  rate  of 
convergence.  This  highlights  a  difficulty  with  the  WS  models,  which  is  that  conver¬ 
gence  rate  estimates  are  difficult  to  obtain  because  of  manner  in  which  truncations  are 
selected.  However,  in  any  application,  it  is  more  important  to  have  good  convergence 
than  to  have  good  theoretical  estimates  of  the  convergence  rate.  Note  that  the  the 
cochlear  filter  clearly  defines  a  system  which  lends  itself  to  compact  time-frequency 
localized  representation.  In  some  applications,  keeping  the  number  of  parameters  that 
need  to  be  estimated  small  may  be  more  important  than  the  model  order,  e.g.  for 
on-line  estimation  of  parameters.  A  Nth-order  Laguerre  model  requires  N  parame¬ 
ters,  whereas,  for  the  example  analyzing  wavelet  used  here  the  number  of  parameters 
in  a  Nth  order  model  is  N/ 2,  (N / 4  complex  parameters).  Figure  4.18  is  equivalent 
to  Figure  4.17,  where  now  the  abscissa  represents  the  number  of  parameters  in  the 
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model. 


4.7.2  Approximation  of  a  Delay  System 


Consider  now  the  simple  delay  system  with  transfer  function, 


G(s)  = 


s 2  +  qis  +  q2 


(4.7.24) 


Once  more,  we  apply  both  WS  and  Laguerre  techniques  to  approximating  this  system. 
To  make  this  example  somewhat  more  interesting,  we  select  r,  qlf  and  q2,  such  that  the 
representation  of  the  delay  and  the  time  constant  of  the  system  cause  poor  convergence 
of  both  the  Laguerre  and  WS  approximations.  We  wil  use  r  =  2.0,  qi  =  1.25,52  = 
0.40625.  For  this  example,  700  samples  of  the  frequency  response  from  u  =  0,  to 
u  =  20,  were  used. 


Number  of  Parameters 


Figure  4.18:  Normalized  approximation  error  versus  number  of  parameters  for  ap 
proximation  of  cochlear  filter  frequency  response  using  WS  and  Laguerre  models 
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WS  Approximation  of  Delay  System 

It  is  more  revealing  in  this  example  to  consider  the  quality  of  the  various  approxima¬ 
tions  in  the  time  domain.  In  Figure  4.19  the  coefficient  magnitudes  for  a  high  order 
WS  approximation  of  the  delay  system  (4.7.24)  are  plotted.  Note  that  the  coefficients 
are  well-localized.  An  interesting  feature  of  the  decomposition  shown  in  Figure  4.19 
is  that  if  we  equate  the  dilation  axis  to  the  time  axis  and  the  translation  axis  to  the 
frequency  axis,  we  see  that  the  delay  in  the  system  is  visually  evident  in  the  coeffi¬ 
cients.  Figures  4.20-4.21  show  the  WS  approximations  of  the  impulse  response  of  the 
system  4.7.24  for  various  model  orders  N . 

Laguerre  Approximation  of  Delay  System 

Laguerre  approximation  of  the  delay  system  4.7.24  requires  a  careful  selection  of  the 
Laguerre  shift  parameter  p.  In  many  cases  optimal  selection  of  p ,  may  be  quite 
difficult  as  we  shall  show  shortly.  To  illustrate  just  how  significant  the  choice  of  p 
can  be,  consider  Figure  4.22  in  which  two  curves  are  plotted  for  the  time-domain 
approximation  error  for  Laguerre  approximants  of  the  system  (4.7.24).  For  the  first 
curve  p,  was  held  constant  at  p  —  1.56,  as  the  model  order  was  varied.  This  value 
of  p  was  chosen  by  manual  optimization  for  a  eighth  order  model.  Note  that  the 
convergence  of  the  Laguerre  approximants  in  this  case  is  too  slow  to  be  of  any  practical 
utility.  The  second  curve,  which  demonstrates  much  better  convergence,  was  obtained 
by  manual  optimization  of  p,  every  time  the  model  order  was  changed.  The  problem 
with  the  time  domain  convergence  of  the  Laguerre  approximants  with  p  fixed  can 
partly  be  attributed  to  the  finite  length  of  the  data  in  the  frequency  domain  and  the 
fact  that  the  Laguerre  functions  decay  only  as  s-1,  at  infinity.  The  point  to  be  made 
however  is  that  the  choice  of  p,  can  be  crucial  especially  for  low  order  approximants  as 
noted  in  [CW92].  The  problem  of  choosing  optimal  values  of  p,  may  in  fact  prove  to 
be  quite  difficult  due  to  the  complicated  dependence  of  the  error  on  p  .  This  is  perhaps 
best  illustrated  by  Figure  4.23,  which  is  taken  from  Cluett  and  Wang  [CW92].  The 
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Figure  4.19:  Magnitude  of  coefficients  in  WS  approximation  of  delay  system.  Three¬ 


dimensional  plot  (top)  and  density  plot  (bottom) 
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Figure  4.20:  WS  approximation  of  delay  system  for  N  =  6,12,30.  Solid  curve:  ap¬ 
proximation  of  impulse  response;  Dashed  curve:  actual  impulse  response 
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Figure  4.21:  WS  approximation  of  delay  system  for  N  =  40,60,80.  Solid  curve: 
approximation  of  impulse  response;  Dashed  curve:  actual  impulse  response 
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Model  Order 

Figure  4.22:  Normalized  (time-domain)  error  in  Laguerre  approximation  of  the  exam¬ 
ple  delay  system 

surface  shown  in  Figure  4.23  is  the  £2  norm  of  the  first  N,  Laguerre  coefficients,  where 
N  is  model  order,  in  the  Laguerre  approximation  to  a  frequency  response  estimate  for  a 
laboratory  scale  process  trainer  (see  [Lju87] ;  data  available  in  MATLAB  (DRYER2)). 
For  good  approximations,  it  is  desirable  to  choose  p  so  as  to  maximize  the  value  of  the 
function  plotted  in  Figure  4.23.  It  should  be  noted  that  the  surface  in  Figure  4.23,  is 
for  approximation  of  a  particular  system  and  in  general  the  dependence  of  the  error 
on  p,  and  model  order  may  be  far  more  complicated. 

Figures  4.24-4.25  show  the  Laguerre  approximation  of  the  impulse  response  of  the 
system  (4.7.24),  using  the  manually  determined  values  of  p,  shown  in  Figure  4.22. 
The  (time-domain)  approximation  error  is  plotted  against  the  model  order  in  Figure 
4.26  for  both  the  Laguerre  and  WS  models. 
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Figure  4.23:  Surface  representing  norm  of  Laguerre  approximant  to  a  frequency  re¬ 
sponse  estimate  for  a  laboratory  scale  process  trainer.  Taken  from  Cluett  and  Wang 
(1992) 

4.8  Summary 

In  this  chapter  we  examined  some  aspects  of  the  use  of  truncated  WS  representations 
as  parametric  ‘black  box’  models  for  system  identification.  The  manner  in  which  WS 
representations  are  used  here  is  closest  in  spirit  to  the  use  of  Laguerre  function  repre¬ 
sentations  in  the  approximation  and  identification  of  linear  systems.  Some  properties 
of  WS  models  which  make  them  amenable  to  use  in  problems  of  system  identification 
and  rational  approximation  are: 

(1)  WS  models  are  linear-in-parameters. 

(2)  Both  time  and  frequency  domain  a  priori  information  may  be  incorporated  in 
WS  models. 

(3)  The  parallel  nature  of  the  models  allow  for  a  second  stage  of  model  reduction 
following  parameter  estimation. 

(4)  The  underlying  building  blocks  of  WS  models  are  based  on  frames  and  thus 
there  exists  a  great  deal  of  flexibility  in  the  choice  of  these  building  blocks. 
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Figure  4.24:  Laguerre  approximation  of  delay  system  for  N  —  8, 10, 30.  Laguerre  shift 
parameter  p  is  indicated  on  the  plots.  Solid  curve:  approximation  of  impulse  response; 
Dashed  curve:  actual  impulse  response 


114 


Figure  4.25:  Laguerre  approximation  of  delay  system  for  N  =  40,60,80.  Laguerre 
shift  parameter  p  is  indicated  on  the  plots.  Solid  curve:  approximation  of  impulse 
response;  Dashed  curve:  actual  impulse  response 
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Model  Order 

Figure  4.26:  Approximation  error  versus  model  order  for  approximation  of  the  example 
delay  system  by  Laguerre  models  and  WS  models.  The  Laguerre  shift  parameter  p, 
was  ‘manually  optimized’  each  time  the  model  order  was  increased. 

It  was  shown  that  WS  models  may  perform  considerably  better  than  Laguerre 
models  for  certain  classes  of  problems.  Much  of  the  difficulty  with  the  Laguerre  model 
can  be  associated  with  the  fact  that  there  is  only  one  parameter,  namely  the  Laguerre 
shift  parameter  p,  which  can  be  used  to  control  the  performance  of  the  Laguerre  model 
far  any  given  problem.  Table  4.1  briefly  summarizes  some  aspects  for  comparison  of 
WS  models  and  Laguerre  models.  A  column  is  included  for  the  extended  Laguerre 
models  (c.f.  Section  4.4)  which  possess  some  characteristics  of  both  WS  models  and 
the  classical  Laguerre  models. 

Some,  as  yet  unanswered,  questions  which  we  regard  as  topics  for  further  research 
are 

(1)  What  are  the  theoretical  rates  of  convergence  for  WS  approximation  of  vari¬ 
ous  classes  of  transfer  functions,  and  how  do  these  compare  with  other  known 
rational  approximation  techniques  such  as  Laguerre  approximation. 
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(2)  What  can  be  said  about  the  selection  of  truncations  of  WS  representations  so 
as  to  ensure  optimal’  convergence. 

(3)  Under  what  conditions  can  convergence  of  WS  approximations  in  other  norms, 
in  particular  the  H°°(n+)  norm,  be  established. 

The  answer  to  the  first  question  above  may  in  general  be  quite  complicated  since  it 
depends  on  the  particular  scheme  employed  for  the  selection  of  truncations. 

We  also  showed  that  the  ad  hoc  procedure  (c.f.  (4.4.15))  suggested  by  Wahlberg 
[Wah91],  for  improving  the  performance  of  Laguerre  models  on  systems  with  multiple 
dispersed  time  constants,  leads  to  a  class  of  extended  Laguerre  series  representations 
when  viewed  from  the  viewpoint  of  frame  theory.  This  observation  creates  a  framework 
for  analyzing  the  convergence  properties  of  such  models. 
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WS 

Laguerre 

Extended 

Laguerre 

Model  type 

black-box, 

linear-in- 

par  ameters 

black-box, 

linear-in¬ 

parameters 

black-box, 

linear-in¬ 

parameters 

Approximation 

space 

H2(n+) 

H2(n+) 

H2(n+) 

Decomposition 

structure 

parallel 

series 

parallel-series 

A  priori  knowledge 

multiple 

time  constants, 

delays, 

frequency  ranges 

time  constant, 

delay 

multiple 

time  constants, 

delays 

Representation 

frames 

orthonormal 

bases 

frames 

Adaptability  of 

model 

m,  n,  (a0,60,^) 

p,  model  order 

{Pj}f=v  K> 

model  order 

Sensitivity  to  pa¬ 
rameter 

perturbations 

robust  for  re¬ 
dundant  frames 

sensitive 

robust  due  to 

redundancy 

Convergence  in 

other  norms 

Undetermined 

H°°(n+), 

L°°(1R+),  for 

certain  classes 

Table  4.1:  WS  vs  Laguerre  models 
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Chapter  5 


Wavelet  Analysis  and  Synthesis 
of  Feedforward  Neural  Networks 


In  this  chapter,  we  investigate  the  use  of  affine  wavelet  frames  in  analyzing  and  syn¬ 
thesizing  feedforward  neural  networks.  Feedforward  neural  networks  are  ‘biologically 
inspired’  computational  architectures  which  have  found  application  in  problems  of 
approximating  static  mappings  based  on  discrete  data.  As  noted  in  Chapter  1,  prob¬ 
lems  of  this  nature  arise  in  a  wide  variety  of  fields.  Clearly  feedforward  networks  are 
not  the  only  means  of  solving  such  problems.  Approximation  theory  is  a  very  well- 
developed  area  of  mathematics,  and  there  exist  numerous  ‘conventional’  approaches 
to  these  problems  (e.g.  orthogonal  basis  approximations).  The  main  attraction  of  the 
neural  network  approach  has  been  the  empirically  demonstrated  success  in  problems 
involving  mappings  with  high- dimensional  domains  and/or  ranges.  For  such  problems, 
where  the  domain  is  high  dimensional,  the  more  structured  techniques  of  classical  ap¬ 
proximation  theory  often  lead  to  computationally  intractable  formulations.  The  use 
of  neural  networks  may  be  thought  of  as  a  somewhat  naive  approach  in  the  sense 
that  to  date  there  does  not  exist  much  structure  in  the  approach.  It  has  been  known 
for  some  time  that  feedforward  networks  with  particular  classes  of  ‘activation  func¬ 
tions’  are  capable  of  generating  ‘good’  approximations  to  certain  classes  of  mappings. 
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These  results  (c.f.  [Cyb89,  Cyb88,  HSW89,  HSW90])  have  for  the  most  part  been 
based  on  arguments  of  density  of  the  class  of  mappings  implement  able  in  feedforward 
networks,  in  various  function  spaces.  These  results  may  be  regarded  as  justification 
for  the  use  of  such  networks  in  approximation  problems.  However,  the  nature  of 
these  arguments  do  not  reveal  precisely  the  feedforward  network  implementation  of 
a  given  mapping.  In  this  sense,  they  are  nonconstructive  results.  It  is  thus  natural 
to  ask  whether  there  is  an  alternative  formulation  which  allows  one  to  describe  the 
approximation  properties  of  feedforward  networks  and  at  the  same  time  gain  some 
insight  into  the  exact  feedforward  network  implementation  of  a  given  mapping.  In 
this  chapter  it  is  shown  that  affine  wavelet  theory  can  provide  precisely  this  type  of 
formulation.  Our  approach  to  this  problem  was  initially  motivated  by  the  observa¬ 
tion  that  feedforward  neural  network  architectures  inherently  posses  a  translation  and 
dilation  structure.  Furthermore,  since  the  theory  of  frames  permits  a  great  deal  of 
flexibility  in  the  choice  of  analyzing  wavelets,  it  was  natural  to  try  to  combine  these 
two  observations  to  formulate  a  wavelet  description  of  feedforward  networks. 

For  the  sake  of  completeness,  we  briefly  review  some  of  the  salient  features  of  the 
feedforward  network  approach  to  approximation  in  Section  5.1.  The  remainder  of  this 
chapter  is  organized  as  follows. 

In  Section  5.2  it  is  shown  that  the  inherent  translation  and  dilation  structure 
of  feedforward  networks  may  be  utilized  to  implement  affine  wavelet  decompositions 
within  the  standard  architecture  of  feedforward  neural  networks.  For  simplicity  we 
restrict  discussion  primarily  to  single-input-single-output  networks.  However,  the  the¬ 
oretical  results  of  this  chapter  apply  to  the  higher-dimensional  settings  as  wrell,  as  is 
briefly  discussed  in  Sections  5.2.3  and  5.4. 

Among  topics  which  have  received  some  research  attention  in  the  are  of  neural 
networks,  is  the  choice  of  ‘activation  functions’.  One  of  the  most  commonly  used 
activation  functions  is  the  so-called  sigmoidal  function.  Section  5.2.1  is  concerned  with 
constructing  affine  (wavelet)  frames  using  combinations  of  sigmoids.  It  is  important 
to  point  out  that  there  is  no  a  priori  need  to  restrict  oneself  to  sigmoidal  activation 
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functions.  However,  by  doing  so  the  power  and  flexibility  of  frame  theory  is  clearly 
demonstrated. 

The  main  analysis  result  of  this  chapter  is  Theorem  5.1  which  states  that  wavelet 
decompositions  of  L2(IR)  may  be  implemented  within  the  standard  architecture  of 
feedforward  networks  with  sigmoidal  activation  functions.  Thus  for  a  given  mapping  in 
L2(IR)  ,  the  feedforward  network  implementation  can  be  readily  computed.  Theorem 
5.1  is  easily  extended  to  higher  dimensions  (L2(IRn)  ),  and  we  briefly  describe  this 
extension  in  Section  5.2.3. 

In  Section  5.3  we  outline  two  schemes  in  which  spatio-spectral  localization  prop¬ 
erties  of  wavelets  are  used  to  formulate  synthesis  procedures  for  feedforward  neural 
networks.  It  is  shown  that  such  synthesis  procedures  can  result  in  systematic  defi¬ 
nition  of  network  topology  and  simplified  network  ‘training’  problems.  Most  of  the 
weights  in  the  network  are  determined  via  the  synthesis  process  and  the  remaining 
weights  may  be  obtained  as  a  solution  to  a  convex  optimization  problem.  Since  the 
resulting  optimization  problem  is  one  of  least  squares  approximation,  the  remaining 
weights  can  also  be  determined  by  solving  the  associated  ‘normal  equations’.  The 
synthesis  schemes  may  be  theoretically  extended  to  higher  dimensions.  However, 

A  few  simple  numerical  simulations  of  the  methods  of  this  chapter  are  provided  in 
Section  5.3.4. 

5.1  Background  on  Feedforward  Neural  Networks 

In  this  section  we  provide  a  brief  introduction  to  the  use  of  feedforward  neural  networks 
to  functional  approximation  problems. 

Let  0  be  a  set  containing  pairs  of  sampled  inputs  and  the  corresponding  outputs 
generated  by  an  unknown  map,  /  :  IRm  — ►  IR11,  m,n  <  oo,  i.e.  0  =  {(ad,?/*)  :  yl  = 
/( ad);  ad  G  IRm,  y1  G  IRn,i  =  1,...,K,  K  <  cxi}.  We  call  0  the  training  set.  Note 
that  the  samples  in  0  need  not  be  uniformly  distributed.  The  problem  at  hand  is  to 
use  the  data  provided  in  0  to  ‘learn’  (approximate)  the  map  /.  Many  existing  schemes 
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to  perform  this  task  are  based  on  parametrically  fitting  a  particular  functional  form 
to  the  given  data.  Simple  examples  of  such  schemes  are  those  which  attempt  to  fit 
linear  models  or  polynomials  of  fixed  degree  to  the  data  in  0.  More  recently,  nonlinear 
feedforward  neural  networks  have  been  applied  to  the  task  of  ‘learning’  the  map  /. 

5.1.1  Feedforward  Neural  Networks 

The  basic  component  in  a  feedforward  neural  network  is  the  single  ‘neuron’  model 
depicted  in  Figure  5.1(a).  Where  ui,...,un  are  the  inputs  to  the  neuron,  ki,...,kn 


►y 


(b) 


Figure  5.1:  (a)Single  neuron  model,  (b)  Simplified  schematic  of  single  neuron 

are  multiplicative  weights  applied  to  the  inputs,  I  is  a  biasing  input,  g  :  IR  —  IR, 
and  y  is  the  output  of  the  neuron.  Thus  y  =  g(J27-i  ^iui  +  /)■  The  ‘neuron’  of 
Figure  5.1(a)  is  often  depicted  as  shown  in  Figure  5.1(b)  where  the  input  weights, 
bias,  summation,  and  function  g  are  implicit.  Traditionally,  the  activation  function 
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g  has  been  chosen  to  be  the  sigmoidal  nonlinearity  shown  in  Figure  5.2.  This  choice 


Figure  5.2:  Sigmoidal  activation  function. 


of  g  was  initially  based  upon  the  observed  firing  rate  response  of  biological  neurons. 
A  feedforward  neural  network  is  constructed  by  interconnecting  a  number  of  neurons 
(such  as  the  one  shown  in  Figure  5.1)  so  as  to  form  a  network  in  which  all  connections 
are  made  in  the  forward  direction  (from  input  to  output  without  feedback  loops)  as  in 
Figure  5.3.  Neural  networks  of  this  type  usually  consist  of  an  input  layer,  a  number  of 
hidden  layers,  and  an  output  layer.  The  input  layer  consists  of  neurons  which  accept 
external  inputs  to  the  network.  Inputs  and  outputs  of  the  hidden  layers  are  internal 
to  the  network,  and  hence  the  term  ‘hidden’.  Outputs  of  neurons  in  the  output  layer 
are  the  external  outputs  of  the  network.  Once  the  structure  of  a  feedforward  network 
has  been  decided,  i.e  the  number  of  hidden  layers  and  the  number  of  nodes  in  each 
hidden  layer  has  been  set,  a  mapping  is  ‘learned’  by  varying  the  connection  weights, 
Wij' s  and  the  biases,  I/s  so  as  to  obtain  the  desired  input-output  response  for  the 
network1. 

One  method  often  used  to  vary  the  weights  and  biases  is  known  as  the  backprop- 
agation  algorithm  in  which  the  weights  and  biases  are  modified  so  as  to  minimize  a 
'We  will  use  Wi3  to  denote  the  weight  applied  to  the  output  Oj  of  the  jth  neuron  when  connecting 
it  to  the  input  of  the  ith  neuron.  I}  is  the  bias  input  to  the  jth  neuron. 


123 


X 


Input  Layer 


Hidden  Layers 


Output  Layer 


y 

Figure  5.3:  Multilayered  feedforward  neural  network. 


cost  functional  of  the  form, 


E=  £  110*'-^,  (5.1.1) 

(x',3/‘)6© 

where  O'  is  the  output  vector  (at  the  output  layer)  of  the  network  when  xl  is  applied 
at  the  input.  Backpropagation  employs  gradient  descent  to  minimize  E.  That  is,  the 
weights  and  biases  are  varied  in  accordance  with  the  rules, 


At Vij  —  —e 


dE 

dwij 


and 


A/j  = 


dE 


Remarks:  Although  feedforward  neural  networks  have  empirically  demonstrated  an 
ability  to  approximate  complicated  maps  very  well  using  the  technique  just  described, 
to  date  there  does  not  really  exist  a  satisfactory  theoretical  foundation  for  such  an 
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approach.  To  clarify  the  meaning  of  a  satisfactory  theoretical  foundation,  let  us  list 
some  of  the  problems  that  one  should  be  able  to  address  within  a  good  theoretical 
setting. 

(1)  Development  of  a  well-founded  systematic  approach  to  choosing  the  number  of 
hidden  layers  and  the  number  of  nodes  in  each  hidden  layer  required  to  achieve 
a  given  level  of  performance  in  a  given  application. 

(2)  Learning  algorithms  often  ignore  much  of  the  information  contained  in  the  train¬ 
ing  data,  and  thereby  overlook  potential  simplification  of  the  weight  setting 
problem.  As  we  will  show  later,  preprocessing  of  training  data  can  simplify  the 
training  problem. 

(3)  An  inability  to  adequately  explain  empirically  observed  phenomena.  For  ex¬ 
ample,  the  cost  functional  E  may  possess  many  local  minima  due  to  the  non- 
linearities  in  the  network.  A  gradient  descent  scheme  such  as  backpropagation 
is  bound  to  settle  to  such  local  minima.  However,  in  many  cases,  it  has  been 
observed  that  settling  to  a  local  minimum  of  E  does  not  adversely  affect  over¬ 
all  performance  of  the  network.  Observations  such  as  this  demand  a  suitable 
explanatory  theoretical  framework. 

The  methods  of  this  chapter  offer  a  framework  within  which  it  is  possible  to  address 
at  least  the  first  two  issues  above. 

5.2  Dilations  and  Translations  in  SISO  Neural  Net¬ 
works 

In  this  section  we  shall  demonstrate  how  affine  wavelet  decompositions  of  L2(1R)  can 
be  implemented  within  the  architecture  of  single-input-single-output  (SISO)  feedfor¬ 
ward  neural  networks  with  sigmoidal  activation  functions.  Consider  the  SISO  feed¬ 
forward  neural  network  shown  in  Figure  5.4.  Input  and  output  layers  of  this  network 
each  consist  of  a  single  node,  whose  activation  function  is  linear  with  unity  gain.  In 
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Figure  5.4:  SISO  feedforward  neural  network 


addition,  the  network  has  a  single  hidden  layer  with  N  nodes,  each  with  activation 
function  g(-).  Hence  the  output  of  this  network  is  given  by 

N 

y  =  fix)  =  XI  w3,N+ig(w0,jX  -  Ij ),  (5.2.2) 

J=1 

where  we  have  labeled  the  input  node  0  and  the  output  node  N  +  1. 

The  key  observation  here  is  that  the  mapping  /  (5.2.2)  implemented  by  the  SISO 
feedforward  network,  is  a  Unear  combination  of  dilates  and  translates  of  a  single  func¬ 
tion  (the  activation  function  g).  Thus  the  mapping  implemented  in  a  SISO  feedforward 
neural  network  is  precisely  of  the  form  of  an  affine  wavelet  decomposition  (2.2.25). 
The  key  differences  are:  (i)  The  summation  in  (5.2.2)  is  finite,  and  (ii)  Even  if  we 
permit  infinitely  many  hidden  layer  nodes,  and  let  gj  ~  g(wo1jX  —  Ij ),  the  infinite 
sequence  {gn}  will  not  necessarily  be  a  frame  for  L2(IR)  . 

To  formulate  a  wavelet  description  of  the  feedforward  network  of  Figure  5.4,  it 
is  thus  necessary  to  investigate  frames  generated  by  translations  and  dilations  of  the 
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activation  function  g.  In  the  next  section  we  discuss  how  sigmoidal  activation  functions 
may  be  combined  to  construct  admissible  analyzing  wavelets  and  thereby  affine  frames. 


5.2.1  Affine  Frames  From  Sigmoids 

As  mentioned  earlier,  sigmoidal  functions  are  among  the  most  commonly  used  ac¬ 
tivation  functions  in  feedforward  networks.  Although  it  is  not  necessary  to  restrict 
attention  to  sigmoidal  activation  functions,  we  do  so  here  to  illustrate  the  power  and 
flexibility  offered  by  frame  theory  in  the  construction  of  analyzing  wavelets.  Recall 
from  Chapter  2  that  essentially  the  only  restriction  on  the  analyzing  wavelet  is  the 
admissibility  condition  (2.2.8). 

Consider  the  sigmoidal  function,  s(ar)  =  (1  +  e~x)~l  shown  in  Figure  5.2.  Since 
s  £  L2(IR)  ,  it  is  impossible  to  construct  a  frame  for  L2(1R)  using  individual  translated 
and  dilated  sigmoids  as  frame  elements.  Note  however,  that  the  difference  of  two 
translated  sigmoids  is  in  L2(IR)  for  finite  translations  and  that  in  general  if  we  let 

M  M 

<p(?)  =  Yj  S*nbn(x)  -  S<n <*„(*)  (5'2‘3) 

n— 1  7i=l 

where  M  <  oo  and  sab(x)  =  s(ax  —  b),  a,  b  <  oo  then  <p  £  L2(IR)  .  Furthermore,  note 
that  the  combinations  (5.2.3)  are  precisely  of  the  form  which  can  be  implemented  in 
the  architecture  of  a  feedforward  network.  With  these  observation,  it  is  easily  shown 
that  affine  frames  for  L2(IR)  may  be  constructed  using  analyzing  wavelets  generated 
as  combinations  of  sigmoids  of  the  form  (5.2.3). 

Let  s(x)  =  (1  +  e-9ar)-1,  where  q  >  0  is  a  constant  which  controls  the  ‘slope’  of 
the  sigmoid  .  To  obtain  a  function  in  L2(1R)  ,  we  combine  two  sigmoids  as  in  (5.2.3). 
Let 

<p(x)  —  s(x  +  d)  —  s(x  —  d),  0  <  d  <  oo.  (5.2.4) 


So,  <£>(•)  is  an  even  function  which  decays  exponentially  away  from  the  origin.  Now, 
let 


xp(x)  =  <p(x  +  p)  -  <p(x  -  p ). 


(5.2.5) 
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Figure  5.5:  (Top)  Analyzing  wavelet  candidate  constructed  from  three  sigmoids 
(ip(x)  =  s(x  +  2)  -  2s(x)  +  s(z  -  2)).  (Bottom)  Square  magnitude  of  the  Fourier 
transform  of  ip,  (|^|2). 


Thus  ip(-)  (see  Figure  5.5)  is  an  odd  function,  with  / ip(x)dx  =  0,  which  is  dominated 
by  a  decaying  exponential.  It  is  easily  shown  that  ip  satisfies  the  admissibility  condition 
(2.2.8).  The  Fourier  transform  of  <p  is  given  by 


<p(u)  —  f  <p(x)e  Ju,xdx  = 

1r 


2ir  sm(u>d) 


(5.2.6) 


Therefore  the  Fourier  transform  of  ip  is, 


#*>) 


e~>p“<p( u)  -  ejpuj${uj)  = 


A'x  sin(pu;)  sin(cftu) 

3T  smh(^) 


(5.2.7) 


The  function  ip  and  the  square  magnitude  of  its  Fourier  transform  (|V>|2)  are  shown 
in  Figure  5.5  for  p  =  d  =  1  and  q  -  2.  Note  that  the  function  ip  is  reasonably 
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well-localized  in  both  the  time  and  frequency  domains.  Table  5.1  lists  some  relevant 
parameters  describing  the  (numerically  determined)  localization  properties  of  if).  For 
this  choice  of  (p,  d ,  q )  (and  in  general  whenever  p  —  d)  ip  is  a  linear  combination  of 
three  sigmoids, 

i>(x)  =  s(x  +  2)  -  2s(x)  +  s(x  —  2). 

Figure  5.6  shows  the  implementation  of  ip  in  a  feedforward  network. 


Figure  5.6:  Feedforward  network  implementation  of 


€ 

e 

xc{i)) 

^c(|^|2) 

e-supp(^,e) 

e-supp(|^|2,e) 

0.1 

0.1 

0.0 

0.9420 

[-2.15,2.15] 

[0.2920,  1.5920] 

Table  5.1:  Time-frequency  localization  properties  of  ip  for  ( p,d,q )  =  (1, 1,2) 
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Dilation  and  Translation  Stepsizes  for  the  Wavelet  Constructed  From  Sig- 
moids 

As  in  Chapter  3,  Theorem  2.6  may  be  applied  to  numerically  determine  translation 
and  dilation  stepsizes  a0  and  bo,  for  which  the  family  of  functions  forms  an 

affine  frame  for  L2(IR)  ,  where  the  analyzing  wavelet  ip  is  as  in  (5.2.5).  Numerical 
results  of  applying  Theorem  2.6  and  Corollary  2.1,  with  dilation  stepsize  a  =  2.0, 
to  the  construction  of  an  affine  frame  using  the  analyzing  wavelet  ip  (5.2.5)  with 
(p,d,q)  =  (1, 1,2)  are  shown  in  Figures  5. 7-5.8.  Figure  5.7  shows  the  estimates  of  the 
upper  and  lower  frame  bounds,  A  and  B,  for  various  values  of  the  translation  stepsize 
b.  Figure  5.8  is  a  plot  of  the  ratio  B /A  versus  the  translation  stepsize.  From  these 
results  we  see  that  for  a  =  2o  and  0  <  b  <  3.5,  (ip,a,b)  generates  an  affine  frame  for 
L2(IR)  . 

Remarks 

The  conditions  in  Theorem  2.6  and  subsequently  those  in  Corollary  2.1,  in  general 
may  be  very  conservative  since  the  theorem  relies  on  the  Cauchy- Schwarz  mequality 
to  establish  bounds. 

In  some  applications,  it  may  be  desirable  to  use  a  ‘sparsely  distributed’  frame 
to  ‘cover’  a  given  time  interval  and  frequency  band  using  a  small  number  of  frame 
elements.  As  can  be  seen  from  Figure  5.8,  sparsity  can  be  achieved  to  some  extent  at 
the  cost  of  ‘tightness’  of  the  frame. 
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Figure  5.7:  Estimates  of  frame  bounds,  using  analyzing  wavelet  -0  constructed  from 
sigmoids,  with  ao  =  2,  as  translation  stepsize  bo  is  varied.  Solid  curve:  lower  frame 
bound  A;  Dashed  curve:  upper  frame  bound  B. 


Figure  5.8:  Ratio  (B/A)  of  estimated  frame  bounds  using  analyzing  wavelet  con¬ 
structed  from  sigmoids,  with  ao  =  2,  as  translation  stepsize  bo  is  varied.  Solid  curve: 
B/A  Dashed  line:  constant=1.0 
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5.2.2  Feedforward  Network  Analysis  Theorem 

It  now  follows  that  we  have  constructively  proved  the  following  analysis  result  for 
SISO  feedforward  networks  with  sigmoidal  activation  functions. 

Theorem  5.1  Feedforward  neural  networks  with  sigmoidal  activation  functions  and 
a  single  hidden  layer  can  represent  any  function  f  E  L2(IR)  .  Moreover,  given  f  E 
L2(IR)  ,  all  weights  in  the  network  are  determined  by  the  wavelet  frame  expansion  of 

f, 

f(x)  =  Y  (/’  5"~  Vmn)  'f’mn(x). 

m,n 


Remarks: 

(a)  In  this  section  we  have  concentrated  on  wavelets  constructed  from  sigmoids.  We 
would  however  like  to  point  out  that  the  methods  of  this  section  are  applicable 
to  a  wide  variety  of  choices  of  the  activation  function.  For  some  examples  of 
other  activation  functions  we  refer  the  reader  to  [SW89]. 

(b)  Among  other  activation  functions  used  in  neural  networks,  is  the  discontinuous 
sigmoid  (step)  function.  Note  that  using  such  a  step  function  together  with 
the  methods  of  this  section  results  in  a  analyzing  wavelet  which  is  the  Haar 
wavelet.  Dilates  and  translates  of  the  Haar  function  generate  an  orthonormal 
basis  for  L2(IR)  .  The  Haar  transform  is  the  earliest  known  example  of  an  affine 
wavelet  transform. 

5.2.3  Wavelets  For  L2(lRn)  Constructed  From  Sigmoids 

Although  we  shall  primarily  restrict  attention  to  the  one-dimensional  setting  (L2(1R)  ), 
wavelets  for  higher  dimensional  domains  (L2(lRn)  )  may  also  be  constructed  within 
the  standard  feedforward  network  setting  with  sigmoidal  activation  functions.  In  ap¬ 
plications  such  as  image  processing  it  is  desirable  to  use  wavelets  which  exhibit  orien¬ 
tation  selectivity  as  well  as  spatio-spectral  selectivity.  In  the  setting  of  Multiresolution 
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Analysis  [Mal89c]  for  example,  wavelet  bases  for  L2(IR2)  are  constructed  using  tensor 
products  of  wavelets  for  L2(IR)  and  the  corresponding  ‘smoothing’  functions.  This 
method  results  in  three  analyzing  wavelets  for  L2(IR2)  each  with  a  particular  orienta¬ 
tion  selectivity.  However  neural  network  applications  do  not  necessarily  require  such 
orientation  selective  wavelets.  In  this  case,  it  is  possible  to  use  translates  and  dilates 
of  a  single  ‘isotropic’  function  to  generate  wavelet  bases  or  frames  for  L2(lRn)  (c.f. 
[Mal89b]).  Figures  5.9-5.10  show  both  an  isotropic  mother  wavelet  and  an  orientation 
selective  mother  wavelet  for  L2(IR2)  which  are  implemented  in  a  standard  feedfor¬ 
ward  neural  network  architecture  with  sigmoidal  activation  functions.  The  wavelets  of 
Figures  5.9-5.10  are  implemented  by  taking  differences  of ‘bump’  functions  which  are 
generated  using  a  construction  given  by  Cybenko  in  [Cyb88].  As  before,  if  we  let  s(-) 
denote  the  sigmoidal  activation  function,  Cybenko’s  construction  gives  the  following 
formula  for  a  N-dimensional  bump  function, 


where  x  €  IRN,  and  X{  denotes  the  ith  component  of  x.  This  is  the  N-dimensional 
equivalent  of  the  ‘bump’  function  ip  (see  (5.2.4)),  of  the  one- dimensional  case.  The 
difference  is  that  two  hidden  layers  are  required  in  the  general  N-dimensional  case. 
Figure  5.11  shows  the  feedforward  network  implementation  of  a  two-dimensional  bump 
function.  Admissible  analyzing  wavelets  for  L2(1RN)  may  be  constructed  by  taking 
appropriate  differences  of  such  bump  functions.  Thus  the  analysis  result  of  Theorem 
5.1  is  easily  extended  to  higher  dimensions  in  this  manner  (with  the  appropriate 
modification  for  two  hidden  layers). 

Remark:  It  is  important  to  note  however,  that  as  the  dimension  N  grows,  the  size 
of  the  network  grows  quite  dramatically.  This  is  primarily  due  to  the  number  of 
sigmoidal  nodes  required  to  implement  a  single  wavelet.  Thus,  although  there  are  no 
theoretical  problems  with  this  formulation,  practical  considerations  may  weigh  quite 
heavily  in  it’s  use.  This  already  suggests  that  if  the  wavelet  formulation  is  to  find 
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Figure  5.9:  Two-Dimensional  isotropic  wavelet  constructed  from  sigmoids  in  feedfor¬ 
ward  network  architecture 


Figure  5.10:  Two-Dimensional  orientation  selective  wavelet  constructed  from  sigmoids 
in  feedforward  network  architecture 
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Figure  5.11:  Feedforward  network  implementation  of  two-dimensional  ‘bump’  function 
using  sigmoidal  activation  functions. 

practical  use  in  feedforward  neural  networks,  it  is  perhaps  necessary  to  consider  other 
activation  functions  which  more  naturally  give  rise  to  N-dimensional  wavelets. 


5.3  Synthesis  of  Feedforward  Neural  Networks  Using 
Wavelets 

In  Section  5.2.1,  it  was  shown  that  affine  frames  for  L2(1R)  can  be  constructed  based 
on  analyzing  wavelets  0  which  are  constructed  as  combinations  of  sigmoidal  func¬ 
tions.  From  this  was  derived  the  analysis  result  Theorem  5.1  which  states  that  any 
function  in  L2(1R)  may  be  approximated  by  a  feedforward  network  with  one  hidden 
layer  and  sigmoidal  activation  functions.  This  result  is  analogous  to  the  existence 
results  mentioned  in  the  beginning  of  this  chapter.  There  is  however  an  additional 
statement  in  Theorem  5.1.  Namely,  given  a  particular  function  in  L2(IR)  ,  its  exact 
feedforward  network  implementation  is  given  by  its  wavelet  decomposition.  In  this 
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section,  we  shall  examine  some  implications  of  the  wavelet  formalism  for  functional 
approximation  based  on  sigmoids,  in  the  synthesis  of  feedforward  neural  networks. 
As  was  described  in  Section  5.1.1,  sigmoidal  functions  have  served  as  the  basis  for 
functional  approximation  by  feedforward  neural  networks.  However,  in  the  absence  of 
an  adequate  theoretical  framework,  topological  definitions  of  feedforward  neural  net¬ 
works  have  for  the  most  part  been  trial-and-error  constructions.  We  will  demonstrate, 
by  means  of  the  simple  network  discussed  in  Section  5.2,  how,  it  is  possible  to  incorpo¬ 
rate  the  joint  time  and  frequency  domain  characteristics  of  any  given  approximation 
problem  into  the  initial  network  configuration. 

Let  /  €  L2(IR)  be  the  function  which  we  are  trying  to  approximate.  In  other 
words,  we  are  provided  a  set  0  of  sample  input-output  pairs  under  the  mapping  /, 

0  =  {(z\</‘) :  y%  =  /(**);  *\yx  e  JR}, 

and  we  would  like  to  obtain  a  good  approximation  of  /.  To  perform  the  approximation 
using  a  neural  network,  the  first  step  is  to  decide  on  a  network  configuration.  For  this 
problem,  it  is  clear  that  the  input  and  output  layers  must  each  consist  of  a  single 
node.  The  remaining  questions  are  how  many  hidden  layers  should  we  use  and  how 
many  nodes  should  there  be  in  each  hidden  layer.  These  questions  can  be  addressed 
using  the  wavelet  formulation  of  the  last  section.  We  consider  a  network  of  the  form 
in  Figure  5.4,  i.e.  with  a  single  hidden  layer.  At  this  point,  a  traditional  approach 
would  entail  fixing  the  number  of  nodes  N,  in  the  hidden  layer  and  then  applying 
a  learning  algorithm  such  as  backpropagation  (described  in  Section  5.1.1)  to  adjust 
the  three  sets  of  weights,  input  weights  {tnojjyLi ,  output  weights  {wj^+ i}yLi>  and 
the  biases  {Ij}.  We  would  like  to  use  information  contained  in  the  training  set  0  to, 
(1)  decide  on  the  number  of  nodes  in  the  hidden  layer,  and  (2)simplify  the  training 
algorithm. 

Here  we  describe  two  possible  schemes  for  use  of  the  wavelet  transform  formulation 
in  the  synthesis  of  feedforward  networks.  The  first  scheme  captures  the  essence  of  how 
time-frequency  localization  can  be  utilized  in  the  synthesis  procedure.  However,  this 
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scheme  is  difficult  to  implement  when  considering  high  dimensional  mappings  and 
in  most  cases  will  result  in  a  network  that  is  far  larger  than  necessary.  We  also 
outline  a  second  method  which  further  utilizes  the  time-frequency  localization  offered 
by  wavelets  to  reduce  the  size  of  the  network.  This  second  method  is  conceivably  a 
more  viable  option  in  the  case  of  higher  dimensional  mappings. 

5.3.1  Network  Synthesis:  Method  I 

Let  /  G  L2(IR)  be  the  function  we  are  trying  to  approximate,  and  as  before,  let 

)  —  T'min *  -1-  max]  i 

denote  the  frequency  concentration  2  of  /.  Also  assume  that  there  exists  a  finite 
interval 

•R(/)  ~  [^min >  2-max] > 

in  which  we  wish  to  approximate  /.  Our  network  synthesis  procedure  is  described  in 
algorithmic  form  below. 


Synthesis  Algorithm: 

Step  I  Our  first  step  is  to  perform  a  frequency  analysis  of  the  training  data.  In 
this  step  we  wish  to  obtain  an  estimate  of  the  frequency  concentration  fi(/),  of 

/  based  on  the  data  contained  in  0.  A  number  of  techniques  can  be  considered  for 
performing  this  estimate.  We  will  not  elaborate  on  such  techniques  here.  Let  £>,„;»  be 
our  estimate  of  minjn,  and  u3lliax  be  our  estimate  of  u>max. 

Step  II  We  now  use  the  knowledge  of  ft(/)  and  R(f),  to  choose  the  particular  frame 
elements  to  be  used  in  the  approximation.  The  main  idea  in  this  step  is  to  choose  only 
those  elements  of  the  frame  {ipmn}  which  ‘cover’  the  region  Qj  of  the  time-frequency 
plane  defined  by 

Qf  =  R(f)  X  fi(/), 

2Assuming  /  is  real-valued,  we  need  only  consider  positive  frequencies. 
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which  represents  the  concentration  of  /  in  time  and  frequency  as  determined  from  the 
data  0.  Recall  that,  the  time  frequency  concentration  of  the  wavelets  't/’TO,n  is  denoted 

by, 

Qm.n ( V' )  =  ^m,n(Vl)  X  1  ^ni ,n  (  V ) • 

Figure  5.12  shows  the  location  of  Q /,  and  the  Q,  ,n’s  together  with  the  time-frequency 
concentration  centers  (zc(^m)n),wc(|^m)n|2)  of  the  frame  elements.  Therefore  to 
‘cover’  Q/(e,e)  we  need  to  determine  the  index  set  I  of  pairs  (m,  n)  of  integer  trans¬ 
lation  and  dilation  indices  such  that, 

Qm,n  nQ//i,  for  (m,  n)  G  I. 

Step  III  Given  I,  it  is  now  possible  to  configure  the  network.  From  the  manner  in 
which  I  is  defined,  we  expect  to  be  able  to  obtain  an  approximation  to  /  of  the  form 

/(•*")  ~  ^  1  0n,nt^m,n(®)  —  /IT)'  ( 0 . 3 . 8) 

(m,n)sl 

for  x  G  [xmin,  xmax] •  The  approximation  error  in  (5.3.8)  can  be  made  arbitrarily  small 
by  allowing  e  and  e  to  go  to  zero  in  the  computation  of  the  various  concentration 
regions  used  to  define  the  sets  Qj  and  Qmn.  This  is  because  we  know  that  {V’m.n}  is 
a  frame  and  therefore  it  is  possible  to  write  /  as 

/(x)  =  ^  cm<n{f)i> m,n  (5.3.9) 

m,neZ 

for  some  coefficients  {cm>n(/)}.  Returning  to  the  single-hidden  layer  feedforward 
network  shown  in  Figure  5.4,  choose  the  number  of  nodes  in  the  hidden  layer  to  be 
equal  to  the  number  of,  elements  in  1,  i.e.  N  =  #(I)  where  the  activation  function 
of  each  node  is  taken  to  be  V;  (as  in  5.2.5).  Now  if  we  set  the  weights  from  the  input 
node  to  the  hidden  layer  and  the  biases  on  each  hidden  layer  node  to  the  dilation  and 
translation  coefficients  indexed  by  (m,n)  El,  then  the  output  of  the  network  can  be 
written  as 

y  =  c7n,nrm,„(x)  (5.3.10) 

(ra,n)(=I 
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where  x  is  the  input  of  the  network  and  cm  n’ s  are  the  weights  from  the  hidden  layer 
to  the  output  node.  We  have  therefore  obtained  a  network  configuration  which  d  nines 
an  output  function  (5.3.10)  that  is  exactly  of  the  form  required  to  approximate  the 
function  /  (Equation  (5.3.811 

it  remains  to  determine  the  coefficients  cm<n's  in  (5.3.10)  that  will  result  in  the 
d<  -ired  approximation. 

5.3.2  Network  Synthesis:  Method  II 

The  synthesis  algorithm  described  above  in  Section  5.3.1  uses  identification  of  an 
‘important’  region  Qy  of  the  time-frequency  plane.  Critical  to  identification  of  this 
region  is  the  ‘bandwidth’  estimate  made  in  Step  I.  There  are  two  significant  drawbacks 
of  making  such  a  bandwidth  estimate: 

(1)  Estimation  of  spectral  concentration  of  signals  in  high  dimensions  is  computa¬ 
tionally  expensive. 

(2)  Any  estimate  of  spectral  concentration  which  relies  on  Fourier  techniques  is 
going  to  generate  a  generalized  rectangle  in  joint  time-frequency  space.  For 
many  functions  such -a  rectangular  concentration  in  time-frequency  is  simply  an 
artifact  of  the  spatial  nonlocality  of  the  Fourier  basis.  For  example,  an  estimate 
of  the  frequency  concentration  of  the  signal  in  Figure  2.1  will  generate  a  rectangle 
in  time-frequency  as  the  concentration  of  the  signal.  If  we  then  use  this  rectangle 
to  choose  which  elements  of  a  wavelet  basis  to  use  to  approximate  the  signal, 
the  time-frequency  rectangle  will  dictate  that  large  dilations  (corresponding  to 
high  frequencies)  of  the  w'avelets  be  used  over  the  entire  time  interval.  However, 
since  each  wavelet  is  also  localized  in  time,  and  high  frequency  components  of 
the  signal  are  localized  as  well,  this  is  clearly  an  excessive  number  of  wavelets. 
Large  dilations  can  be  used  locally  where  needed. 

Spatio-spectral  localization  properties  of  wavelets  can  be  further  exploited  to  reduce 
the  number  of  network  nodes  (wavelets)  used  in  the  approximation.  The  basic  idea 


140 


is  that  since  wavelets  are  well-suited  to  identify  spatially  local  regions  of  fine  scale 
(high  frequency)  features  in  a  signal,  locations  and  values  local  maxima  of  the  wavelet 
approximation  coefficients  at  one  scale  (dilation)  indicate  whether  or  not  it  is  necessary 
to  locally  renne  the  V-  '  l-.-.  --ee  of  wavelets  at  finer  scales  (c.f.  [MH]). 

A  network  synthesis  algorithm  using  this  idea  would  be  an  adaptive  proceauit.  cf  rPo 
following  form. 

(1)  boustruct  and  train  a  network  to  approximate  the  mapping  at  some  scale  a™ 
over  the  entire  spatial  region  of  interest. 

(2)  Identify  local  maxima  of  the  wavelet  coefficients  and  locally  refine  the  approxi¬ 
mation  by  adding  new  dilations  (nodes)  to  the  network  where  needed. 

(3)  Repeat  (2)  until  some  stopping  criterion  has  been  satisfied. 

Using  a  scheme  such  as  this  would  result  in  approximations  being  performed  over 
regions  of  time-frequency  of  the  form  shown  in  Figure  5.13.  Some  aspects  of  this 


Figure  5.13:  Form  of  time-frequency  coverage  from  approximation  scheme  of  Section 
5.2. 

scheme  are  discussed  in  [Pat91]. 
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5.3.3  Computation  of  Coefficients  (Training) 

Variational  computation  of  wavelet  coefficients  based  on  training  data 

Although  the  problem  of  determining  the  wavelet  coefficients  in  a  finite  approximation 
can  be  well  formulated,  we  know  of  no  analytic  solution  to  the  problem  of  explicitly 
computing  the  coefficients,  given  only  (possibly  irregularly  spaced)  samples  of  the 
function.  We  can  however  formulate  tne  coefficient  computation  problem  as  a  varia¬ 
tional  principle  in  a  fashion  analogous  to  learning  algorithms  such  as  backpropagation. 
We  define  our  cost  functional  to  be 

E=  E  ||o*'  -  yil2  =  E  I  E  wV>m, «(**')  -  y{\2,  (5.3.11) 

(x',y')€Q  (x‘,y')e&  (m,n)el 

where  0l  is  the  output  of  the  network  when  x 1  is  the  input  as  in  Section  5.1.1.  We 
choose  the  wavelet  coefficients  as  those  which  minimize  E.  As  a  result  of  the  wavelet 
formulation,  the  weights  to  be  determined  appear  linearly  in  the  output  equation  of 
the  network.  Thus  E  is  a  convex  function  of  the  coefficients  {cm,n}  and  therefore  any 
minimizer  c*  =  {c*n  n}jm  of  E  is  a  global  minimizer.  Simple  iterative  optimization 
algorithms  such  as  gradient  descent  can  be  used  to  minimize  E. 

Remark:  Since  the  cost  functional  (5.3.11)  defines  a  linear  least  squares  problem, 
the  training  problem  may  also  be  solved  via  the  normal  equations  as  discussed  in 
Chapter  2. 

5.3.4  Simulations 

As  a  test  of  the  neural  network  synthesis  procedure  described  above,  we  simulated 
a  few  simple  examples.  As  a  first  test  we  chose  the  bandlimited  function  comprised 
of  two  sinusoids  at  different  frequencies,  specifically  f(x)  —  sin( 2ttox)  +  sin(‘2~ lOx) 
which  is  shown  in  Figure  5.14.  Taking  x„,;n  =  0.0  and  xmax  =  0.3,  50  randomly  spaced 
samples  of  the  function  were  included  in  the  training  set  0.  A  single  dilation  of  the 
mother  wavelet  was  chosen  (m  =  6)  which  covered  the  frequency  range  adequately 
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Figure  5.14:  Original  bandlimited  function  /( x)  =  sin(27r5z)  +  sin(27rl0z),  (solid 
curve),  and  finite  wavelet  approximation  (dashed  curve). 

(see  Figure  5.15).  Translations3  of  this  dilation  of  ip  which  contributed  significantly 
in  the  interval  [znijn,  zmax]  were  used,  resulting  in  40  hidden  units.  Applying  a  simple 
gradient  descent  scheme  to  minimize  E ,  an  approximation  to  /  was  obtained.  The 
resulting  approximation  is  shown  in  Figure  5.14  along  with  the  original  function. 

A  second,  slightly  more  complicated,  example  was  simulated  by  first  generating  a 
random  spectrum  (Figure  5.16)  which  is  concentrated  in  frequency  and  then  sampling 
the  corresponding  function  in  the  time  domain.  The  result  of  this  simulation  using 
again  just  one  dilation  of  the  mother  wavelet  is  shown  in  Figure  5.17. 


3These  translations  were  integer  multiples  of  the  translation  stepsize  b. 


10 


-10 - 1 - 1 - ' - 1 - 1 - ' - - - i - i - 1 

-0.5  -0.4  -0.3  -0.2  -0.1  0  0.1  0.2  0.3  0.4  0.5 

time  (seconds) 


Log  Frequency  (Hz) 

Figure  5.15:  (Top)  Wavelet  F>m,n  for  n  =  0,  m  =  6,  (Bottom)  Square  magnitude  of 
Fourier  transform  of  ipmn  (n  =  0,m  =  6). 

5.4  Summary 

We  have  demonstrated  that  it  is  possible  to  construct  a  theoretical  description  of  feed¬ 
forward  neural  networks  in  terms  of  wavelet  decompositions.  This  description  follows 
naturally  from  the  inherent  translation  and  dilation  structure  of  such  networks.  The 
wavelet  description  of  feedforward  networks  easily  characterizes  the  class  of  mappings 
which  can  be  implemented  in  such  architectures.  Although  such  characterizations  have 
been  previously  provided  in  a  number  of  different  forms  [Cyb89,  Cyb88,  HSW89],  to 
our  knowledge,  no  previous  characterization  using  sigmoidal  activation  functions  is 
capable  of  defining  the  exact  network  implementation  of  a  given  function.  What  is 
distinctly  different  about  the  wavelet  viewpoint  is  that  it  provides  an  extremely  flexi- 
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Figure  5.16:  Frequency-concentrated  ‘random’  spectrum. 

ble  (not  necessarily  orthogonal)  transform  formalism.  This  flexibility  has  been  utilized 
in  this  chapter  to  construct  a  transform  based  upon  combinations  of  sigmoids.  Once 
again,  we  would  like  to  point  out  that  in  general  there  is  nothing  special  about  sig¬ 
moidal  functions  and  that  a  variety  of  different  activation  functions,  including  e.g. 
orthogonal  wavelets  can  be  of  significant  interest.  Sigmoidal  functions  however  hold 
one  attraction;  such  functions  can  be  easily  implemented  in  analog  integrated  circuitry 
(see  e.g.[Mea89j).  Aside  from  this,  we  have  chosen  to  work  with  sigmoidal  functions 
only  to  demonstrate  the  general  methodology  that  can  be  applied  in  the  context  of 
feedforward  neural  networks. 

In  addition  to  providing  a  theoretical  framework  within  which  to  perform  analysis 
of  feedforward  networks,  the  wavelet  formalism  supplies  a  tool  which  can  be  used  to 
incorporate  spatio-spectrai  information  contained  in  the  training  data  in  structuring  of 
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Figure  5.17:  Frequency-concentrated  signal  corresponding  to  random  spectrum  in 
Figure  14  (solid  curve),  and  finite  wavelet  approximation  (dashed  curve). 

the  network.  Two  possible  schemes  to  perform  this  task  were  described  in  Section  5.3. 
Minimality  in  terms  of  the  number  of  nodes  in  the  network  cannot  be  guaranteed  using 
these  methods4.  However,  it  is  possible  to  estimate  the  approximation  error  ([Dau90]) 
in  terms  of  the  signal  energy  lying  outside  the  chosen  spatio-spectral  region. 

Here,  attention  has  been  primarily  restricted  to  approximating  functions  in 
L2(IR)  .  Most  applications  where  neural  networks  are  particularly  useful  involve  map¬ 
pings  in  higher  dimensional  domains  (e.g.  in  vision,  robot  motion  control,  etc).  Al¬ 
though  extensions  of  the  methods  of  this  chapter  to  higher  dimensions  are  possible 
(as  described  in  Section  5.2.3),  such  extensions  have  the  potential  to  be  computation- 

4This  problem  of  large  networks  is  particularly  limiting  when  considering  mappings  in  higher 
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dimensions 


ally  expensive.  We  are  currently  studying  the  formulation  of  more  computationally 
viable  synthesis  techniques  for  approximation  of  higher  dimensional  mappings  using 
feedforward  neural  networks. 

Using  the  wavelet  formalism  to  synthesize  networks  results  in  a  greatly  simpli¬ 
fied  training  problem.  Unlike  the  situation  in  traditional  feedforward  neural  network 
constructions,  the  cost  functional  is  convex  and  thereby  admits  global  mini,  izing  so¬ 
lutions  only.  Convexity  of  the  cost  functional  is  a  result  of  fixing  the  weights  in  the 
arguments  of  the  nonlinearities  so  as  to  provide  the  required  dilations  and  transla¬ 
tions.  Simple  iterative  solutions  to  this  problem  such  as  gradient  descent  are  thus 
justifiable  and  are  not  in  danger  of  being  trapped  in  local  minima. 
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Chapter  6 


Conclusions  and  Discussion 


In  this  dissertation  we  have  examined  two  problems,  both  of  which  may  be  considered 
under  the  heading  of  approximation  theory.  Specifically  we  considered  the  problems  of 
(i)  rational  approximation  and  identification  of  stable  linear  systems,  and  (ii)  develop¬ 
ing  a  theoretical  framework  for  analysis  and  synthesis  of  feedforward  neural  networks. 
These  problems  were  both  addressed  within  the  nonclassical  setting  of  frames  and 
time-frequency  localized  representations.  The  key  argument  to  be  made  in  support 
of  the  use  of  frames  rather  than  orthonormal  systems  is  the  tremendous  flexibility  of¬ 
fered  by  frame  theory  in  the  selection  of  ‘basis’  functions.  In  many  applications  such 
as  those  considered  in  this  dissertation,  the  choice  of  ‘bases’  with  suitable  properties 
is  more  important  than  orthogonality  or  even  linear  independence.  The  role  played 
by  time-frequency  localization  in  these  applications  is  one  of  providing  a  systematic 
procedure  for  approximation. 

In  Chapter  3  it  was  shown  that  frames  for  the  Hardy  space  H2(n+),  may  be 
constructed  via  dilations  and  translations  of  a  single  rational  analyzing  wavelet.  Fur¬ 
thermore,  a  characterization  of  all  rational  H2(n+)  analyzing  wavelets  was  given  in 
Theorem  3.4.  One  of  the  main  results  of  Chapter  3  is  Theorem  3.3,  which  states  that 
transfer  functions  in  Hfl(n+),  may  be  represented  as  infinite  sums  of  time-frequency 
localized  real-rational  functions  of  bounded  McMillan  degree.  The  representation  of 
Theorem  3.3  arises  via  a  regrouping  of  terms  in  wavelet  series  representations,  where 


148 


the  analyzing  wavelet  is  chosen  to  be  real-rational.  The  series  obtained  via  this  re¬ 
grouping  is  called  a  wavelet  system  ( WS )  decomposition.  Wavelet  system  decomposi¬ 
tions  are  a  means  of  representing  causal  LTI  systems  with  square-integrable  weighting 
patterns  as  parallel  combinations  of  time-frequency  localized  finite-dimensional  sys¬ 
tems.  The  key  property  of  such  decompositions,  which  makes  them  useful  in  problems 
of  rational  approximation  is  the  property  of  time-frequency  localization.  There  are 
of  course  systems  for  which  time-frequency  localization  may  offer  no  significant  ad¬ 
vantage  in  the  construction  of  low-order  approximants,  e.g.  systems  with  very  large 
‘bandwidths’,  and  slowly  decaying  impulse  responses.  However,  for  an  important  class 
of  physical  systems,  time-frequency  localization  can  lead  to  compact  representations. 
For  such  systems,  we  showed  that  rational  approximants  of  fairly  low  order  may  be 
obtained  by  suitable  truncations  of  the  WS  series.  Furthermore  we  showed  that  min¬ 
imal  state  space  realizations  for  the  approximating  systems  may  be  generated  under 
easily  verified  hypotheses  on  the  poles  of  the  analyzing  wavelet.  A  coarse  bound  on  the 
approximation  error  was  derived  based  on  the  l2  norm  of  the  expansion  coefficients. 
The  methods  of  Chapter  3  were  illustrated  by  the  example  of  a  nonrational  transfer 
function  arising  from  the  heat  equation  with  Dirichlet  boundary  control. 

In  Chapter  4  we  considered  the  use  of  truncated  WS  series  as  black-box  rational 
parametric  models  in  problems  of  system  identification.  In  this  case  as  well,  the  key 
property  of  of  WS  representations  was  shown  to  be  time-frequency  localization.  It  was 
shown  that  time-frequency  localization  provides  a  convenient  means  of  incorporating 
a  priori  information  about  the  unknown  system  into  the  formal  properties  of  the 
parametric  model.  Moreover,  time  and  frequency  domain  a  priori  knowledge  may 
be  treated  simultaneously  via  WS  models.  We  made  both  qualitative  and  numerical 
comparisons  of  WS  models  and  the  classical  Laguerre  filter  models.  It  was  shown  by 
examples  that  the  performance  of  WS  models  can  be  significantly  better  than  Laguerre 
models  for  some  commonly  encountered  classes  of  systems.  The  main  difficulty  with 
the  Laguerre  models  is  due  to  the  fact  that  too  much  emphasis  is  placed  on  a  single 
parameter,  namely  the  Laguerre  shift  parameter  p.  This  difficulty  has  been  noted  on 
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a  number  of  occasions  in  the  past.  We  showed  that  the  improvement  on  the  Laguerre 
model  suggested  by  Wahlberg  [Wah91]  which  involved  the  use  of  a  finite  number  of 
Laguerre  shift  parameters  pj,  may  be  formally  interpreted  in  the  context  of  frames. 

In  Chapter  5  we  demonstrated  that  it  is  possible  to  construct  a  mathematical 
description  of  feedforward  neural  networks  via  affine  wavelet  theory.  In  this  descrip¬ 
tion  the  inherent  translation  and  dilation  structure  of  such  networks  is  utilized.  The 
wavelet  description  of  feedforward  networks  identifies  L2  as  a  class  of  mappings  which 
can  be  implemented  in  such  architectures.  The  flexibility  of  frame  theory  was  utilized 
in  constructing  wavelet  transforms  based  on  sigmoidal  functions.  It  was  pointed  out 
that  we  use  sigmoidal  functions  here  mainly  to  demonstrate  the  general  methodol¬ 
ogy,  and  to  show  that  this  methodology  is  applicable  to  the  most  commonly  used 
activation  function.  A  key  difference  between  the  wavelet  formulation  and  previous 
characterizations  of  the  approximating  properties  of  feedforward  networks  with  sig¬ 
moidal  activation  functions,  is  the  (nonorthogonal)  transform  formalism  which  gives 
the  exact  feedforward  network  implementation  of  a  given  mapping. 

We  also  discussed  the  use  of  spatio-spectral  localization  properties  of  of  affine 
wavelets  as  a  tool  for  the  synthesis  of  feedforward  networks,  and  showed  that  such 
synthesis  schemes  lead  to  convex  training  problems. 


Appendix  A 


Proof  of  Daubechies’  Theorem 


Since  this  dissertation  makes  extensive  use  of  Theorem  2.6  and  its  Corollary,  we  include 
the  proof  for  the  sake  of  completeness.  The  proof  also  illustrates  the  applicability  of 
this  theorem  in  the  case  of  the  Hardy  space  H2.  Proof  of  Theorem  2.6  relies  on  the 
Poisson  summation  formula  which  we  state  below. 


Lemma  A.l  (Poisson  Summation  Formula)  Let  f  denote  the  Fourier  transform 
off.  Then 

E  /(<  +  »t)  =  i  e  > 

ngZ  neJL 

A  commonly  used  form  of  the  Poisson  summation  formula  is, 
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Applying  the  Poisson  summation  formula  to  the 


sum  over  n. 
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Separating  out  the  k  =  0  term, 
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By  first  applying  the  Cauchy-Shwarz  inequality  to  the  integral  on  the  right,  and  then 
to  the  sum  over  rn,  we  get 
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Hence  by  hypothesis  (3)  there  exists  50  >  0,  such  that 
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which  gives  the  lower  frame  bound  A  >  0.  The  upper  frame  bound  follows  in  a  similar 
fashion.  ■. 
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Appendix  B 


Proofs  from  Chapter  3 


Proof  of  Proposition  3.2 

Without  loss  of  generality  we  can  consider  the  case  m  =  0.  Therefore  we  need  to 
check  that  for  b  6  1R, 


f(Lj)g(u>  +  b)dco  -  f(u)g(u  -  b)dcoj  =  0.  (2.0.1) 

Consider  the  real  and  imaginary  parts  of  the  l.h.s.  of  Equation  (2.0.1)  separately. 

Real  part  Taking  the  real  part  of  Equation  (2.0.1)  , 


Ke  f(u)g(u  +  b)d'jJ  -  ^ J ^  f(u)g(u  -  b)du 

—  $ie  (J  f(u)g(u  +  b)dui  —  J  f(u)g(u)  —  h)du/j 

=  ifte  (^J  f(u)  (g(u  +  b)  -  g(u  -  b))  du  -f  /( u>)  (g(u  +  b)  -  g(ui  -  b ))  d^j 

=  Ke  (-  J  /(— w)  (g(—uj  +  b)  —  g(—u  —  b))  du 

+  Jo  +  b)  -  g(uj  -  b))  du?j 
Uo  ^~u^9^u  ~  +  6)))dw 

POO 

+  f{u)(g(u  +  b)-g(u-b))d 

Jo 


=  iRe 


=  5Re 


/ 

POO  _ 

POO 

\ 

1  f(u)  (ff(u>  -  b)  -  g(u  +  b))di0  + 

/  /(w)  +  6)  - 

1  V 

—  6))  d^j 

V 

H{w) 

/  (J  v 

-H(w) 

/ 
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=  Ue 


du)  —  0 


Imaginary  part 

Im  (  Jr  +  b^diJ  ~  /(WMW  ~  b)di 

=  Im  (  /  f{u)g(u  +  b)du  +  [  f(u).'u  —  b)du 
\J  R  J  R 

(r  0  yoo 

/  f{u)(g(u+b)  +  g(u-b))du+  /(w)  (#(<*;  +  6)  +  $r(u?  -  6)) 

J-oo  Jo 

=  Im  (- J  f(-u)(g(-u  +  b)  +  g(-u  -  b))du 
+  /(<*>)  +  6)  +  g(u  -  b ))  dw^ 

(ii  /^-a;)(5(-(w-&))  +  fir(-(w  +  fc)))^ 

roo 

/  /(w)  (<7(w  +  b)  +  g(u  -  b)) 

Jo 

roo roo 

/  f(u)(g(u-b)  +  g(u  +  b))du  +  f(u)(g(u  +  b)  +  g(u-b))du 

JO  ' - - - '  Jo  *• - v - ' 

V  H(w) 

=  Im  (J  if  (  in)  +  H(u)  du  )  =  0 

Which  proves  the  proposition  ■. 


=  Im 


+ 

/ 


)  du 


=  Im 


H(u) 


B.l 

Proof  of  Theorem  3.5 

We  outline  here  the  proof  of  Theorem  3.5. 

Proof  of  part  (a):  Since  (3.4.28-3.4.30)  is  a  2JV-dimensional  realization  of  Gm,n(s), 
we  need  only  prove  that  the  rational  function  Gm,n(s)  is  of  order  2 N  for  all  n  6  Z, 
n  ^  0  if  and  only  if  (3.4.36)  holds. 

The  poles  of  Gm,n(s )  (see  (3.4.18))  are, 

{Vh(m,  n),  rjk(m,  n),  vk(m,  n),Vj£m,  n)} 
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where  i]k(m,  n)  and  vk(m,  n)  are  defined  by  (3.4.17).  Assuming  (3.4.36)  holds,  %(m.  n) 
and  vk(m,n)  have  nonzero  imaginary  parts  for  all  7 77,77  G  2.  Therefore,  at  3  = 
n)  for  example,  the  term, 

II(5  ~  Vk(m,  n))(s  -  ^(m,  77)) 
k 

whieh  appears  in  the  numerator  polynomial  lVm,n(s)  of  Gm'n  will  be  nonzero.  Further¬ 
more,  the  assumption  that  \I>  is  of  order  N,  implies  that  pj  ^  zk  for  all  z,  k.  Therefore 
at  s  =  J?fc(m,  n)  the  remaining  (nonzero)  term  in  the  numerator  of  Gm'n  will  be, 

Nm,n(Vk(m,n))  =  a  JJ(s  -  n))(s  -  jj(m,  n))  JJ(s  -  Vk(m,  n))(s  -  Vk(m,  n)), 

3  k 

which  shows  that  Gm'n  has  no  zeros  at  7/^(777,77).  Exactly  the  same  argument  can  be 
used  to  show  that  there  is  no  cancellation  with  zeros  for  the  remaining  poies.  Thus 

Qm,n  jg  Qf  orc[er  21V. 

The  converse  is  readily  proved  using  a  similar  pole-zero  cancellation  argument.  ■ 


Proof  of  part  (b):  By  hypothesis,  (A,n>n,  Cm>n)  is  a  minimal  realization  of 
Gm'n(s )  =  A m,nW/^m,n(s)  for  all  (777, 77)  G  J.  Therefore  Nm<n  and  Dm<n  are  coprime 
polynomials.  Consider  the  parallel  combination  of  two  WS  transfer  functions. 


Gj{s)  =  Gmuni{s )  +  Gm2,n2{s), 

where  J  ={(mu  m),  (m2,  n2)}.  In  this  case  (Nmi>ni  (3),  Dni2>n2 
(iV,n2  ,n2(s)Dmuni(s)),  are  relatively  coprime  pairs.  Rewriting  (2.1.2), 


Gj(s) 


Nj(s) 

Dj(s) 


(2.1.2) 
(s))  and 


_  (  s)Pm2  (f)  4~  AlTO2,n2 {s)Dyii (3) 

Dmi,n\  (■s)Pm.2,n 2  (5) 


Therefore  Nj  and  Dj(s)  will  be  relatively  coprime  if  and  only  if  Dmum  and  Dm!i„2(j) 
are  relatively  coprime.  The  roots  of  DmuUl  and  £>TO2,„2(s)  are  given  by  (3.4.18).  Thus 
Dmi,ni  and  Dra2i„2(s)  will  have  a  common  root  if  and  only  if, 


«o  1  (Pi  +  inib0)  —  a0  ni7(pj  ±  in2b0), 
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for  some  l  andj.  Solving  for  v,\ , 

=  (aoM-m2(3?e  pj  +  ilm  pj)  -  (3?e  pi  +  ilm  pi)j  ±  a“1_m2n2 

=  ^  ^1_m2Irn  pj  -  1m  pi )  ±  a™1_ra2n2^ 

-f  (oS*1_m2»ePi-»cp,)  (2.1.3) 

»o  v  ' 

However,  by  hypothesis  (3.4.37)  there  can  be  no  integer  solution  ri\  to  (2.1.3)  since 
the  imaginary  part  of  (2.1.3)  will  always  be  nonzero.  The  above  argument  for  two  WS 
transfer  functions  is  easily  extended  to  any  finite  number  of  WS  transfer  functions  in 
parallel.  ■ 
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Appendix  C 


Some  Intuition  on  Frames 


This  appendix  is  included  to  help  develop  intuition  and  clarify  some  of  the  properties  of 
frames.  A  few  new  results  are  presented  as  well.  Often  the  familiar  finite-dimensional 
Euclidean  spaces  IRN,  serve  as  a  good  place  to  develop  intuition.  For  this  reason,  most 
of  the  examples  below  are  in  the  setting  of  1RN . 

C.l  Simple  Examples  of  Frames 

Let  us  begin  with  a  few  examples  of  frames  in  the  plane  IR2. 

Example  1:  Exact  Frame  with  A  ^  B 
As  a  first  example,  consider  the  vectors, 

xi  =  (l,0)T  and  x-2  —  (sin  8,  cos  9). 

Then,  {x\,x2},  is  a  frame  for  IR'2  with  frame  bounds 

A  —  1  —  cos  9 

B  —  1  +  cos  9. 

The  vectors  in  this  frame  are  linearly  independent,  and  hence  form  a  Riesz  basis  for 
IR2.  Note  that  with  6  =  n j'2,  we  get  the  standard  orthonormal  basis  for  IR2. 
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Example  2:  Redundant  Tight  Frame 

Often  the  notion  of  tightness  of  a  frame  is  confused  with  the  notion  of  linear  inde¬ 
pendence.  This  next  example  illustrates  the  disjoint  nature  of  these  two  concepts. 
Let, 

x\  =  (i,of 

x2  =  (-1/2,  v/3/2) 

2:3  =  (—1/2,  —a/3/2) 

(see  Figure  C.l).  Clearly  the  above  three  vectors  are  not  linearly  independent,  but 


Figure  C.l:  A  redundant  tight  frame  for  IR2. 

they  do  in  fact  form  a  tight  frame  for  IR2,  with  A  =  B  —  3/2.  Thus  any  vector 
y  G  IR2,  may  be  written  as 

3  2  3 

y=Yj{y'  s~lxk)xk  =  Xk ) Xk- 

k=  1  6  k= 1 

The  interesting  thing  about  this  example  is  that  any  n  unit-length  vectors,  where  n  is 
an  odd,  finite  integer,  will  form  a  tight  frame  for  IR2  if  they  are  distributed  uniformly 
in  the  sense  of  the  angles  9  (see  Figure  C.l)  between  the  vectors  is  6  =  2ir/n.  By  a 
leap  of  faith,  it  is  convenient  to  think  of  tight  frames  as  frames  in  which  the  frame 
elements  are  uniformly  distributed.  It  is  clear  however  that  tightness  does  not  imply 
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linear  independence  in  any  sense  since  by  the  above  example,  a  tight  frame  may  be 
made  arbitrarily  redundant.  Moreover,  linear  independence  does  not  imply  tightness, 
as  is  clear  from  Example  1,  and  also  the  definition  of  Riesz  bases. 


C.2  Frame  Expansion  Coefficients 


In  this  section  we  demonstrate  some  properties  of  frame  expansions,  in  terms  of  the 
expansion  coefficients.  Let  {hn}  be  a  frame  for  a  Hilbert  space  1i ,  and  let  /  £  Ti. 
Consider  the  frame  representation, 

/  =  '£<f,S~lhn  >K.  (3.2.1) 

For  a  redundant  frame,  the  frame  representation  (3.2.1)  is  not  in  general  unique, 
and  moreover,  we  know  by  Theorem  2.5  that  if  {an}  is  another  sequence  such  that 
/  =  En  anK,  then 

E  M2  =  E  I  <  />  S~lhn  >  I2  +  E  I  <  />  S~lbn  >  -«n|2.  (3.2.2) 

The  above  equation  defines  a  hyperplane  in  C2,  and  Theorem  2.5  may  be  interpreted 
as  saying  that  {<  f,S~lhn  >}  is  the  unique  sequence  of  expansion  coefficients  which 
is  orthogonal  to  this  hyperplane.  Every  sequence  {an}  satisfying  (3.2.2),  does  not 
however  comprise  a  representation  of  /,  as  the  following  proposition  states. 


Proposition  C.l  Let  { h k}  be  a  frame  for  H,  and  let  f  £  H,  and  {«„}£  l2  such  that 
(3.2.2)  is  satisfied.  Then  it  does  not  follow  that  f  —  anhn- 

Proof:  Let  Ck  denote  (f,S~lhk),  and  assume  Cj  ^  0  for  some  j.  Now  define  a 
sequence  {afc},  by 


ak  = 


0  j  ±  k 

=  EM2  J  =  k. 


Then  {a^}  satisfies  (3.2.2)  since, 

Eic*-°*i2  =  E!^i2  +  Eic^2-25Re  E« 
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However,  it  is  clear  that  we  in  general  cannot  write 


f  —  }  ®khk  —  (Ijhj. 

unless  /  is  a  multiple  of  the  frame  element  hj.  ■. 


C.2.1  Distribution  of  Coefficients 


An  interesting  property  of  the  coefficients  cn  =  (/,  S~xhn),  is  the  manner  in  which 
they  are  ‘distributed’.  Roughly  speaking,  the  among  all  possible  coefficient  sequences 
representing  the  function  /,  the  sequence  {cn}  is  the  most  ‘uniformly  distributed’. 
This  uniform  distribution  of  coefficients  is  directly  related  to  the  minimum  norm 
property  of  {cn}.  To  clarify  this  statement,  consider  the  following  example  of  a  frame 
for  IR2. 


h\  =  (1,0)T  and  h2  =  I13  =  (0, 1)T. 


Thus  {h\,  h2,  /13}  is  an  inexact  frame  for  IR2  with  A=l,  and  B  =  2.  Thus  if  we 
consider,  the  vector  y  =  (0, 1)T  as  a  test  vector,  then  all  representations  of  y  with 
respect  to  the  frame  {hi,  h2,  /13},  be.  all  {ai, <22,(13}  such  that  y  —  are 

given  by, 

ai  =  0  and  02  +  03  =  1- 


Thus  the  minimum  norm  sequence  is  {01,02,03}  =  {0, 1/2, 1/2}.  This  minimum  norm 
sequence  clearly  makes  the  most  uniform  use  of  the  redundancy  in  the  frame  by 
distributing  the  contributions  of  h2  and  h 3  evenly.  The  most  ‘compact  representation 
of  y  in  this  case  is  {01,02,03}  =  {0,1,0}  (or  {0,0,1})  ,  which  makes  no  use  of  the 
redundancy. 

In  many  applications,  such  as  signal  compression,  it  is  desirable  to  work  with  the 
most  compact  representation.  One  criteria  for  measuring  compactness  of  a  represen¬ 
tation  is  in  terms  of  the  entropy  of  the  coefficients.  For  compactness,  one  would 
like  to  minimize  the  entropy.  This  idea  has  been  employed  by  Coifman  et.  al. 
[CW90,  CMQW90]  for  the  selection  of  a  ‘best  (wavelet)  basis’  for  compression  of 
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signals.  By  the  above  example,  it  is  clear  that  although  the  most  distribute'']  rep¬ 
resentation  is  uniquely  defined,  the  most  compact  representation  is  not  in  general 
uniqtie  for  redundant  frames.  One  drawback  of  selecting  the  most  compact  represen¬ 
tation  with  respect  to  a  redundant  frame,  is  that  the  robustness  properties  of  frame 
representations  (see  Section  2.3.1),  are  lost.  This  is  because  selection  of  the  most 
compact  representation,  involves  eliminating  the  effects  of  redundancy. 

C.3  Addition  of  Frames 

It  is  interesting  to  examine  what  happens  when  frames  for  two  subspaces  M  and  N 
of  a  Hilbert  space  H,  are  combined  and  considered  as  a  frame  for  the  direct  sum 
space  M  ®  N.  For  the  case  in  which  the  frames  are  actually  orthonormal  bases  and 
the  subspaces  M  and  N  are  orthogonal  to  one  another,  it  is  well-known  that  the  the 
union  of  the  two  frames  forms  an  orthonormal  basis  for  the  orthogonal  direct  sum 
space  M  ©  N .  However,  in  the  general  setting  of  frames,  and  subspaces  which  are  not 
orthogonal  to  one  another,  some  fairly  unexpected  things  can  happen. 

Here,  we  obtain  results  which  rely  on  the  subspaces  M,  and  N  being  disjoint.  We 
show  in  Section  C.3.1  that  given  subspaces  M  and  N  of  a  Hilbert  space,  and  frames 
associated  with  each  of  the  subspaces,  the  union  of  the  two  frames  is  a  frame  for 
M  ®  N  whenever  the  minimum  angle,  6m  between  the  two  subspaces  is  bounded  away 
from  zero.  We  also  show  that  the  lower  frame  bound  for  the  union  of  the  two  frames 
can  be  bounded  below  in  terms  of  the  quantity  (1  -  cos0m),  and  bounded  above 
by  the  minimum  of  the  lower  frame  bounds  associated  with  the  frames  for  M  and 
N  individually.  Results  of  this  nature  are  useful  when  considering  truncated  frame 
approximations.  Some  simple  examples  in  which  the  frame  bounds  can  be  explicitly 
computed  are  provided  to  demonstrate  accuracy  of  the  frame  bound  estimates. 

C.3.1  Frames  Generated  by  Subspace  Addition 

Definition  C.l  Given  two  subspaces  M  and  N  of  a  Hilbert  space  the  smallest 
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angle  0m  between  M  and  N  is  defined  as, 

cos9m  =  sup  sup  |  <x,y> 

16M  y£N 

1141=1  IMI=i 

where  9m  £  [0,7 r/2]. 


We  shall  make  use  of  the  following  Theorem  in  proving  the  main  result  (Theorem  C.2) 
of  this  section. 


Theorem  C.l  Let  M  and  N  be  subspaces  of  a  Hilbert  space  TLwith  M  f]N  =  {0}. 
And  let  Pm  and  Pn  be  orthogonal  projectors  on  M  and  N  respectively.  Then 

inf  II  (Pm~Pn)x\\ 

xeM®N ,  ||*||#o  ||m 


Proof:  See  [Ste73] 


Theorem  C.2  Let  M  and  N  be  nontrivial,  closed  subspaces  of  a  Hilbert  space  'H. 
Let  {xj}  be  a  frame  for  M  with  frame  bounds  Am  and  Bm,  and  {yj}  be  a  frame  for 
N  with  frame  bounds  An  and  Bn-  Define  Q  =  Span({xj}  U  {2/j})- 
Let  0m  denote  the  minimum  angle  between  the  subspaces  M  and  N . 

Then  if,  9m  >  0  {xj,  yj}  is  a  frame  for  Q  with  frame  bounds 

Aq  >  Aq  -  mm(AM ,  An)  (1  -  cos  9m) 

Bq  <  Bq  =  max( #iv)  min(2,  - - ~~T~)  (3.3.3) 

1  COS  Ujji 

Proof: 

Note  that  M  f)  N  =  0  by  the  hypothesis  that  9m  >  0.  So  Q  =  M  0  N  and  thus  if 
g  £  Q  there  is  a  unique  decomposition  of  g.  as  g  —  x  +  y  where  x  £  M  and  y  £  N . 

Lower  Frame  Bound: 

Take  g  £  Q  and  let  Pm  and  Pn  he  orthogonal  projection  operators  onto  M  and 
N  respectively. 

J2\  <  g,xj  >  |2  +  |  <  g,yj  >  |2  =  X)  I  <  Pm 9’  xi  >  I2  +  I  <  pN9,  yj  >  I2 
j  i 

>  AM\\PMg\\2  +  An\\Pn9\\2 

>  min(AM,  An)  {||-PmsI|2  +  IlCwffH2}  (3.3.4) 
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Now 


\\Pm9\\2  +  IIP^II2  =  \\Pm9  -  Pn9 ||2  +  25fe  ( Pm9 ,  Pn9 ) 

Lf  \\(Pm-Pn)9V2 


> 


inf 

3 


\\gf-2\{PMg,PNg)\ 


M 

-  s'm2  dm\\g\\2 -2 \(PM9,PNg)\ 

>  sin2  0m||<7||2  -  2\\PMg\\\\PNg\\cosOn 


>  sin2  9 , 


M\2-(\\Pm9\\2  +  \\Pn9\\2)  cose. 


(3.3.5) 


Therefore, 


Or  equivalently, 


(\\Pm9W2  +  H-P/vtfl!2)  (1  +  cos 9rn )  >  sin2 6m\\g\\2.  (3.3.6) 


i2'"0—"2'  Sm‘0m  ",."2  -  i_  eos2  V||g||2  =  (i  -  cosflm)m2  (3.3.7) 


+  cos  9„ 


Thus  from  (3.3.4)  and  (3.3.7)  we  get 


Y2  I  <  9,  Xj  >  |2  +  |  <  g,  Vj  >  |2  >  min  (Am,  Ax)  (1  -  cos  9rn)  ||<;|| 


Upper  Frame  Bound: 


I  <  g,xj  >  I2  +  I  <  g,V]  >  I2  <  Bm\\Pm9\\2  +  Bn\\Pn9\\2 

i 

<  BM\\gf  +  BN\\g\\2  <  2max(5M,5jv)|[$jl8) 


Also, 


\\Pm9\\2  +  \\Pn9W2  =  || Pm9  ~  Pn9\\2  +  ‘2^e  (Pm 9,  Pn9 ) 

<  ||sf||2  +  23fte  (Pm9,Pn9) 

<  |bl|2  +  2II^V/firllll^llCos6,m 

<  Nl2  +  (\\Pm9\\2  +  II^atII2)  cos 0m 

(3.3.9) 
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Thus, 


(lin,9l|2  +  ||/V9||2)  <  ,  ‘  Hall2.  (3.3.10) 

v  7  1  cos  um 

So  we  also  have, 

>  |2  +  I  <  9,Vj  >  I2  <  ma x(Bm,Bn)  _  1  ||ffH'2  (3.3.11) 

•  i  COS  Urn 

Therefore  (by  (3.3.8)  and  (3.3.11)), 

Bq  <  Bq  =  max(i?Af,  BN)  min(2,  . - . ----) 

X  COS  (/ '[Yi 

■ 

It  should  be  noted  that  the  conditions  under  which  Theorem  C.2  guarantees  the 
union  of  two  frames  to  be  a  frame  for  their  combined  span,  are  only  sufficient  con¬ 
ditions.  In  fact  in  all  finite-dimensional  cases,  these  conditions  are  not  necessary. 
In  these  cases,  estimates  of  the  frame  bounds  can  be  made  using  knowledge  of  the 
correlations  ({x;,  £_,-))  among  the  frame  elements  themselves.  However  as  we  show  by 
example  in  the  next  section,  in  an  infinite-dimensional  setting,  the  union  of  two  frames 
can  fail  to  form  a  frame  if  6m  =  0. 

As  can  be  seen  from  Equation  (3.3.3),  the  estimate  Aq  of  Theorem  C.2  for  the 
lower  frame  bound  Aq  is  always  less  than  or  equal  to  min  (Am,  An).  We  now  show  that 
the  actual  lower  frame  bound  Aq  must  indeed  be  less  than  or  equal  to  min(.4jv/,  An)- 
To  show  this  we  first  prove  the  following  lemma. 

Lemma  C.l  Let  M  and  N  be  nontrivial,  closed  subspaces  of  a  Hilbert  space  H, 
M  p|  IV  =  {0}.  Define  Q  =  M  ®  N .  Then  Vx*  G  M,  3  g*  £  Q  such  that 

Pm9*  =  x*  and  Pn9*  —  0-  (3.3.12) 

In  particular, 

g*  =  (I-PN)(I-PMPN)-1x*, 

satisfies  (3.3.12). 
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Proof:  First  note  that  Vx  G  Q,  (I  —  Pn)%  G  A'  J"  H  Q  Semndly  since  ||Pa/Pv||  <  1. 
(/  -  PmPn)~ 1  exists  and  is  given  by 

OO 

(I  -  PmPn)~1x  =  y^(PMP;V)^.  (3.3.13) 

fc=o 

For  any  a;  G  M,  (/  —  PmPn)~1%  G  M  since  every  term  of  the  series  in  the  right  hand 
side  of  Equation  (3.3.13)  is  in  M  and  M  is  closed.  Now  let  x  —  {I  -  PmPn)~1x*  and 
let  g*  —  (/  —  PN)x.  Clearly  Pn9*  =  0.  Also, 

Pm9*  =  Pm(I-Pn)(I-  PmPn)~1x* 

=  Pm{I  -  PmPnT'x*  ~  PmPn(I  -  PmPn)~1x w 
=  (/  -  PmPnY'x*  -  PMPN{I  -  PmPnY'x* 

(since  (/  -  Pm  PnY1  x*  £ 

=  (I-PmPn)(I-PmPnY1x*  =  x''  ■  (3.3.14) 


Theorem  C.3  Let  M  and  N  be  subspaces  of  a  Hilbert  space  PL.  Let  {xj}  be  a  frame 
for  M  with  frame  bounds  Am  and  Bm  ,  and  {y3}  be  a  frame  for  N  with  frame  bounds 
An  and  Bn-  Let  9m  >  0  denote  the  minimum  angle  between  the  subspaces  M  and  N 
and  define  Q  =  M@N .  Then  if  Aq  is  the  lower  frame  bound  for  the  frame  {x3}  (J  {y3} 

of  Q, 

Aq  <  min(AM:^.w). 

Proof: 

Without  loss  of  generality,  assume  Am  <  An-  Thus  since  Am  is  the  lower  frame, 
bound  for  the  frame  {xj}  of  M,  for  any  e  >  0,  3  x*  G  M  such  that 

D<*Vi>l2  +  e>M3- 

By  Lemma  C.l  B  g*  G  Q  such  that  Pm9*  —  and  Pn9*  =  0. 
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Therefore, 


E\(M\2 +  E\(s’,v,)\2 

=  EKflx^^l’  +  EKW.uM* 

=  r  ^v)!2  +  o 

<  (Am  +  <)U*12  <  [  Am  +  £)|!j'||2 

Thus  Aq  <  Am  +  e  and  since  e  >  0  is  arbitrary  we  have  that  Aq  <  Am  = 

min (Am,  An).  u 

C.3.2  Examples 

In  this  section  we  consider  a  few  finite-dimensional  examples  in  which  the  frame 
bounds  can  be  explicitly  computed.  The  general  methodology  in  these  examples  is  as 
follows. 

Let  T  :  H  — >  £2  be  defined  such  that  T  :  /-*{</,  Xj  >}  where  {zj}  is  a  frame 
for  H.  Therefore  the  frame  operator  5  =  T*T.  If  we  let  {e;  }  be  an  orthonormal  basis 
for  7i  then  the  matrix  representation  of  T  with  respect  to  this  basis  is  given  by 

W  =  [wij]  =  [(«,-,  ej)\ . 

Hence  the  upper  and  lower  frame  bounds  are  given  by  the  upper  and  lower  spectral 
limits  of  W*W.  In  the  finite  dimensional  case,  the  frame  bounds  can  be  computed  as 
the  maximum  and  minimum  eigenvalues  of  W*W  or  equivalently,  the  squares  of  the 
maximum  and  minimum  singular  values  of  T. 

Example  I:  A  frame  for  1R2  from  frames  for  1-D  subspaces 

Let,  x  =  (1,0)T  and  y  =  (sin0,  cos#)T.  Define  M  =  Span{x},  N  =  Span{t/};  so  x 
is  a  frame  for  M  with  frame  bounds  Am  =  Bm  =  1  and  y  is  a  frame  for  N  with 
frame  bounds  Hyv  =  B /v  =  1.  In  this  case  9m  =  9.  Clearly  for  any  angle  9  >  0, 
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Spanjx,  j/}=IR'  =  Q.  Using  the  standard  orthonoruial  basis  for  IR2,  we  get 


W7  = 


1  0 
sin  9  cp' d 


Hence 

Aq  --  A  min(W*W)  =  1  —  cos  9 
Bq  =  Amax(LF’,‘VF)  =  1  4-  cos  9 

Since  here  the  lower  frame  bound  is  equal  to  the  lower  frame  bound  estimate  of  The¬ 
orem  C.2,  in  this  case  Theorem  C.2  provides  both  necessary  and  sufficient  conditions. 
Figure  C.2  shows  the  actual  upper  and  lower  frame  bounds  for  this  example. 


Upper  and  Lower  Frame  Bounds  For  2-D  Example 
2| . .  ■ - . - - - , - > - . - . - 

1.8- 

1.6  ■ 

1-4-  ’ 

1.2  -  - 
1  - 
0.8- 
0.6- 

0.4-  ” 

0.2-  _ 

o» - - " - * - * - - - 1 - * - * - 

0  10  20  30  40  50  60  70  80  90 


subspace  angle  (degrees) 


Figure  C.2:  Actual  upper  and  lower  frame  bounds  for  two-dimensional  example 


Example  II:  A  Frame  for  IR3  from  frames  for  IR2  and  IR 


Let  x\  =  (1,0, 0)T,  x-2  =  (0,1, 0)r  and  y  =  (cosw  cos  9,  sinw  cos  9,  sin  9)T .  Let  M  = 
Spanjaq,  0:2}  and  N  =  Span{y}.  Here  9m  =  9.  So  for  any  9  >  0  M  ©  N  =  IR3.  For 
this  example, 


1  0  0 


W  = 


0 


0 


cos  lo  cos  9  sin  w  cos  9  sin  9 
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Ill  this  example  as  well,  we  have, 


Aq  =  Xmia(W*W)  =  1  -  cos  0 
Bq  =  Amax(IPW)  -  1  +  COS0 


Example  III:  Other  Frames  for  1R3  from  frames  for  IR2  nd  IR. 


Let  X\  =  (1,0, 0)T,  X‘2  =  (cos7,sin7,  0)T  and  y  =  (cosu?  cos  0,  sinu  cos  9,  sin  9)T .  Let 
M  =  Span{xi, X2}  and  N  =  Span{i/}.  Here  An  =  1,  Am  =  1  —  cos 7  and  0m  =  9.  So 
for  any  9  >  0  M  ®  IV  =  IR3.  So, 


W  = 


1  0  0 

cos  7  sin  7  0 

cos  u>  cos  9  sin  uj  cos  9  sin  9 


In  this  case,  analytical  expressions  for  the  eigenvalues  of  W*W  are  quite  complicated. 
Therefore  we  shall  demonstrate  the  lower  frame  bounds  numerically  for  a  few  values 
of  7  and  ui.  Figures  C.3-C.7  each  show  for  a  particular  value  of  7,  the  actual  lower 
frame  bounds  and  the  estimate  provided  by  Theorem  C.2  for  different  values  of  u>. 


Figure  C.3:  Example  III  with  7  =  7t/4.  Solid  Line:  Lower  frame  bound  estimate; 
Dashed  lines:  Actual  lower  frame  bound  for  different  values  of  u> 

It  can  be  seen  that  the  lower  frame  bound  estimate  becomes  increasingly  accurate  as  7 
approaches  7t/2.  For  small  7  the  estimate  is  quite  conservative  for  certain  values  of  u, 
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Figure  C.4:  Example  III  with  7  =  7r/3.  Solid  Line:  Lower  frame  bound  estimate; 
Dashed  lines:  Actual  lower  frame  bound  for  different  values  of  u 

however  in  these  cases,  there  also  exist  values  of  u  for  which  the  lower  frame  bound  is 
close  to  the  estimate  of  Theorem  C.2.  In  this  loose  sense,  the  estimate  of  Theorem  C.‘2 
is  as  good  an  estimate  as  can  be  derived  using  knowledge  of  the  minimum  subspace 
angle  alone. 


Figure  C.5:  Example  III  with  7  =  tt/6.  Solid  Line:  Lower  frame  bound  estimate; 
Dashed  lines:  Actual  lower  frame  bound  for  different  values  of  u 
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0.7, 


Figure  C.6:  Example  III  with  7  =  37r/8.  Solid  Line:  Lower  frame  bound  estimate; 
Dashed  lines:  Actual  lower  frame  bound  for  different  values  of  u 

Example  IV:  Violation  of  Lower  Frame  Bound  when  9m  —  0 

By  this  infinite-dimensional  example  (which  can  be  found  in  [Gil9 1] )  we  show  that 
the  lower  frame  bound  can  indeed  be  zero  in  the  case  where  0m  —  0. 

Let  {ej}  be  the  standard  orthonormal  basis  for  l1  and  let  V-’j  =  e3j>  fij  = 
\J  1  -  1 /7  e,3 j  +  >/l/j  e3j+1.  Thus  the  sequences  and  { <f>j }  are  orthonormal 


Figure  C.7:  Example  III  with  7  =  7tt/16.  Solid  Line:  Lower  frame  bound  estimate; 
Dashed  hues:  Actual  lower  frame  bound  for  different  values  of  u> 
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sequences  and  thereby  frames  for  their  respective  closed  spans.  However  if  we  consider 
the  union  of  the  two  frames  and  take  63^+1  G  SpanjVv?  as  a  test  vector,  it  is  easily 
seen  that 

Ke3*+i>  V,jr)|2  +  VMeq+uh)?  =  £  =  ^Ile3i:+i||2- 

3  3 

Hence  since  j  — >  0  as  k  — *■  00,  the  sequence  {V’j,  0j}  is  not  a  frame  for  its  span. 

C.3.3  Discussion 

In  this  section  we  have  provided  a  geometric  characterization  of  conditions  which 
guarantee  that  the  union  of  two  frames  is  a  frame  for  the  appropriate  direct  sum 
space.  The  main  result  of  this  section  is  contained  in  Theorem  C.2  which  says  that 
given  frames  for  subspaces  M  and  N ,  the  union  of  the  frames  is  a  frame  for  the  direct 
sum  space  M  0  N  provided  that  the  minimum  angle  between  the  two  subspaces  is 
bounded  away  from  zero.  An  estimate  for  the  lower  frame  bound  can  be  made  in 
terms  of  the  quantity  1  —  cos#m.  As  mentioned  in  Section  C.3.2,  the  lower  frame 
bound  estimate  in  Theorem  C.2  is  in  a  sense  the  best  estimate  which  can  be  made 
using  the  minimum  subspace  angle  alone.  Furthermore,  we  have  shown  that  the  lower 
frame  bound  is  nonincreasing  with  respect  to  the  lower  frame  bounds  for  the  original 
sub  spaces. 
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